Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
39 changes: 39 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Binaries for programs and plugins
*.exe
*.exe~
*.dll
*.so
*.dylib
gcsb

# Test binary, built with `go test -c`
*.test

# Output of the go coverage tool, specifically when used with LiteIDE
*.out

# Dependency directories
vendor/

# Go workspace file
go.work

# IDE specific files
.idea/
.vscode/
*.swp
*.swo
*~

# OS specific files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Project specific
*.log
benchmark_results_*.txt
275 changes: 123 additions & 152 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,200 +1,171 @@
# GCSB

- [GCSB](#gcsb)
- [Quickstart](#quickstart)
- [Create a test table](#create-a-test-table)
- [Load data into table](#load-data-into-table)
- [Run a load test](#run-a-load-test)
- [Operations](#operations)
- [Load](#load)
- [Single table load](#single-table-load)
- [Multiple table load](#multiple-table-load)
- [Loading into interleaved tables](#loading-into-interleaved-tables)
- [Run](#run)
- [Single table run](#single-table-run)
- [Multiple table run](#multiple-table-run)
- [Running against interleaved tables](#running-against-interleaved-tables)
- [Distributed testing](#distributed-testing)
- [Configuration](#configuration)
- [Roadmap](#roadmap)
- [Not Supported (yet)](#not-supported-yet)
- [Development](#development)
- [Build](#build)
- [Test](#test)

It's like YCSB but with more Google. A simple tool meant to generate load against Google Spanner databases. The primary goals of the project are

- Write randomized data to tables in order to
- Facilitate load testing
- Initiate [database splits](https://cloud.google.com/spanner/docs/schema-and-data-model#database-splits)
- Generate read/write load against user provided schemas

## Quickstart

To initiate a simple load test against your spanner instance using one of our test schemas

### Create a test table

You can use your own schema if you'd prefer, but we provide a few test schemas to help you get started. To get started, create a table named `SingleSingers`

```sh
gcloud spanner databases ddl update YOUR_DATABASE_ID --instance=YOUR_INSTANCE_ID --ddl-file=schemas/single_table.sql
```

### Load data into table
# GCSB - Google Cloud Spanner Benchmark

Load some data into the table to seed the upcoming laod test. In the example below, we are loading 10,000 rows of random data into the table `SingleSingers`

```sh
gcsb load -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000
```
A tool for benchmarking Google Cloud Spanner performance.

### Run a load test

Now you can perform a load test operation by using the `run` sub command. The below command will generate 10,000 operations. Of these operations, 75% will be READ operations and 25% will be writes. These operations will be performed over 50 threads.

```sh
gcsb run -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000 --reads 75 --writes 25 --threads 50
```
## Features

## Operations
- Configurable workload patterns
- Support for various Spanner configurations
- Real-time metrics and reporting
- Commit delay optimization support
- Easy instance creation and testing setup
- Automated dual-region testing

The tool usage is generally broken down into two categories, `load` and `run` operations.
## Prerequisites

### Load
- Go 1.19 or later
- Google Cloud SDK
- Authenticated gcloud session
- Project with Spanner API enabled
- Permission to create Spanner instances

GCSB proides batched loading functionality to facilitate load testing as well as assist with performing [database splits](https://cloud.google.com/spanner/docs/schema-and-data-model#database-splits) on your tables.
## Installation

At runtime, GCSB will detect the schema of the table you're loading data for an create data generators that are appropriate for the column types you have in your database. Each type of generator has some configurable funcationality that allows you to refine the type, length, or range of data the tool generates. For in depth information on the various configuration values, please read the comments in [example_gcsb.yaml](example_gcsb.yaml).
There are two ways to install GCSB:

#### Single table load
### Option 1: Install from source (Recommended)
```bash
# Clone the repository
git clone https://github.com/cloudspannerecosystem/gcsb.git
cd gcsb

By default, GCSB will detect the table schema and create default random data generators based on the columns it finds. In order to tune the values the generator creates, you must create override configurations in the gcsb.yaml file. Please see that file's documentation for more information.
# Build the binary
go build -o gcsb

```sh
gcsb load -t TABLE_NAME -o NUM_ROWS
# Verify the installation
./gcsb --help
```

Additionally, please see `gcsb load --help` for additional configuration options.

#### Multiple table load

Similar to the above [Single table load](#single-table-load), you may specify multiple tables by repeating the `-t TABLE_NAME` argument. By default, the number of operations is applied to each table. For example, specifying 2 tables with 1000 operations, will yield 2000 total operations. 1000 per table.

```sh
gcsb load -t TABLE1 -t TABLE2 -o NUM_ROWS
### Option 2: Install using go install (Advanced)
```bash
go install github.com/cloudspannerecosystem/gcsb@latest
```
Note: If using `go install`, you'll need to update the paths in the test scripts to use the full path to your GCSB binary (usually in `$GOPATH/bin/gcsb`).

Operations per table can be configured in the yaml configuration. For example

```yaml
tables:
- name: TABLE1
operations:
total: 500
- name: TABLE2
operations:
total: 500
```
## Quick Start

#### Loading into interleaved tables
1. Ensure you have the Google Cloud SDK installed and are authenticated:
```bash
# Install Google Cloud SDK if not already installed
# Visit https://cloud.google.com/sdk/docs/install for installation instructions

Loading data into interleaved tables is not supported yet. If you want to create splits in the database, you can load data into parent tables.
# Authenticate with Google Cloud
gcloud auth login

### Run

#### Single table run

By default, GCSB will detect the table schema and create default random data generators based on the columns it finds. In order to tune the values the generator creates, you must create override configurations in the gcsb.yaml file. Please see that file's documentation for more information.

```sh
gcsb run -p YOUR_GCP_PROJECT_ID -i YOUR_INSTANCE_ID -d YOUR_DATABASE_ID -t SingleSingers -o 10000 --reads 75 --writes 25 --threads 50
# Set your project ID
gcloud config set project YOUR_PROJECT_ID
```

Additionally, please see `gcsb run --help` for additional configuration options.

#### Multiple table run

Similar to the above [Single table run](#single-table-run), you may specify multiple tables by repeating the `-t TABLE_NAME` argument. By default, the number of operations is applied to each table. For example, specifying 2 tables with 1000 operations, will yield 2000 total operations. 1000 per table.

```sh
gcsb run -t TABLE1 -t TABLE2 -o NUM_ROWS
2. Build the GCSB binary (if not already done):
```bash
go build -o gcsb
```

Operations per table can be configured in the yaml configuration. For example

```yaml
tables:
- name: TABLE1
operations:
total: 500
- name: TABLE2
operations:
total: 500
3. Create test instances using the provided script:
```bash
chmod +x create_test_instances.sh
./create_test_instances.sh
```

#### Running against interleaved tables
The script will:
- Show available Spanner configurations
- Let you select which configurations to create
- Create instances and databases with test schemas
- Provide example commands for running benchmarks

Run operations against interleaved tables are only supported on the APEX table.
Available configurations include:
- Single-region (us-central1)
- Multi-region (nam3)
- Dual-region (australia-southeast1+australia-southeast2)
- Both STANDARD and ENTERPRISE versions

Using our [test INTERLEAVE schema](schemas/multi_table.sql), we see an INTERLEAVE relationship between the `Singers`, `Albums`, and `Songs` tables.
### Dual-Region Testing

You will notice that if we try to run against any child table an error will occur.
For testing dual-region configurations with Enterprise Plus edition:

```sh
gcsb run -t Songs -o 10
```bash
# First, make sure you've built the binary (if not already done)
go build -o gcsb

unable to execute run operation: can only execute run against apex table (try 'Singers')
# Then run the test script
chmod +x run_dual_region_test.sh
./run_dual_region_test.sh
```

## Distributed testing
This script will:
- Create an Enterprise Plus instance in dual-region configuration
- Create a database with test schema
- Load initial test data
- Run a benchmark with commit delay optimization
- Save results to a timestamped file

GCSB is intended to run in a stateless mannger. This design choice was to allow massive horizontal scaling of gcsb to stress your database to it's absolute limits. During development we've identified kubernetes as the prefered tool for the job. We've provided two separate tutorials for running gcsb inside of kubernetes
The test configuration includes:
- 4 concurrent threads
- 100,000 operations (~5GB of data)
- 100ms commit delay
- 100% write operations
- Large string fields to help reach target data size

- [GKE](docs/GKE.md) - For running GCSB inside GKE using a service account key. This can be used for non-GKE clusters as well as it contains instructions for mounting a service account key into the container.
- [GKE with Workload Identity](docs/workload_identity.md) - For running GCSB inside a GKE cluster that has workload identity turned on. This is most useful in organizations that have security policies preventing you from generating or downloading a service account key.
## Troubleshooting

## Configuration
Common issues and solutions:

The tool can receive configuration input in several different ways. The tool will load the file `gcsb.yaml` if it detects it in the current working directory. Alternatively you can use the global flag `-c` to specify a path to the configuration file. Each sub-command has a number of configuration flags that are relevant to that operation. These values are bound to their counterparts in the yaml configuration file and take precedent over the config file. Think of them as overrides. The same is true for environment variables.
1. "cannot execute binary file: Exec format error"
- Make sure you've built the binary for your system using `go build -o gcsb`
- If using `go install`, update the script to use the full path to your GCSB binary (usually in `$GOPATH/bin/gcsb`)

Please note, at present, the yaml conifguration file is the only way to specify generator overrides for data loading and write operations. Without this file, the tool will use a random data generator that is appropriate for the table schema it detects at runtime.
2. "Permission denied" when running scripts
- Make sure the scripts are executable: `chmod +x *.sh`

For in depth information on the various configuration values, please read the comments in [example_gcsb.yaml](example_gcsb.yaml)
3. "Project not found" or "Permission denied" for Spanner operations
- Verify your gcloud authentication: `gcloud auth login`
- Check your project ID: `gcloud config get-value project`
- Ensure you have the necessary Spanner permissions

### Supported generator type
4. "Instance already exists" error
- The script will automatically use existing instances if they match the naming pattern
- To create a new instance, either delete the existing one or modify the instance name in the script

The tool supports the following generator type in the configuration.
## Configuration Options

| type | description |
|------|-------------|
| `UUID_V4` | Generates UUID v4 value. Supported column types are `STRING` and `BYTES`. Note that UUID is automatically inferred for `STRING(36)` column without a configuration. |
### Workload Configuration

## Roadmap
- `threads`: Number of concurrent worker threads
- `operations`: Total number of operations to perform
- `commitDelay`: Maximum commit delay for write operations (e.g., "100ms")
- `tables`: List of tables to benchmark
- `name`: Table name
- `writeRatio`: Ratio of write operations (0.0 to 1.0)
- `readRatio`: Ratio of read operations (0.0 to 1.0)
- `rowCount`: Number of rows to operate on

### Not Supported (yet)
### Data Generation

- [ ] Interleaved tables for Load and Run phases.
- [ ] Generating read operations utilizing [ReadByIndex](https://cloud.google.com/spanner/docs/samples/spanner-read-data-with-index#spanner_read_data_with_index-go)
- [ ] Generating NULL values for load operations. If a column is NULLable, gcsb will still generate a value for it.
- [ ] JSON column types
- [ ] STRUCT Objects.
- [ ] VIEWS
- [ ] Inserting data across multiple tables in the same transaction
- [ ] No SCAN or DELETE operations are supported at this time
- [ ] Tables with foreign key relationships
- [ ] Testing multiple tables at once
- `column`: Column name
- `type`: Data type (uuid, string, int)
- For strings:
- `length`: Maximum string length
- For integers:
- `min`: Minimum value
- `max`: Maximum value

## Development
## Performance Optimization

### Build
The tool supports commit delay optimization for write operations. This feature:
- Buffers mutations in transactions
- Delays commits to improve throughput
- Reduces the number of round trips to Spanner

```sh
make build
To enable commit delay, set the `commitDelay` parameter in your configuration:
```yaml
workload:
commitDelay: 100ms # Maximum delay for commits
```

### Test
## Contributing

```sh
make test
```
Contributions are welcome! Please feel free to submit a Pull Request.

## License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Loading