Search ANN Benchmark

Benchmark the search performance of Approximate Nearest Neighbor (ANN) algorithms implemented in various systems. This repository contains a Python CLI tool to evaluate and compare the efficiency and accuracy of ANN searches across different platforms.

Introduction

Approximate Nearest Neighbor (ANN) search algorithms are essential for handling high-dimensional data spaces, enabling fast and resource-efficient retrieval of similar items from large datasets. This benchmarking suite aims to provide an empirical basis for comparing the performance of several popular ANN-enabled search systems.

Supported Engines

Engine	Version	GitHub Actions
Qdrant	1.16.2
Elasticsearch	9.2.3
OpenSearch	3.4.0
Milvus	2.6.7
Weaviate	1.35.2
Vespa	8.620.35
pgvector	0.8.1-pg17
Chroma	1.4.0
Redis Stack	7.4.2-v2
Vald	v1.7.17
ClickHouse	25.8
LanceDB	0.26.0

Prerequisites

Before running the benchmarks, ensure you have the following installed:

Docker
Python 3.10 or higher
uv (Python package manager)

Installation

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone the repository and install dependencies:

git clone https://github.com/codelibs/search-ann-benchmark.git
cd search-ann-benchmark
uv sync

Download the dataset:
```
bash scripts/setup.sh
```
For GitHub Actions (smaller dataset):
```
bash scripts/setup.sh gha
```

Usage

Run a benchmark

# Run Qdrant benchmark with default settings
uv run search-ann-benchmark run qdrant

# Run Elasticsearch with specific configuration
uv run search-ann-benchmark run elasticsearch --target 1m-768-m48-efc200-ef100-ip

# Run with quantization
uv run search-ann-benchmark run elasticsearch --quantization int8 --variant int8

# Skip filtered search benchmark
uv run search-ann-benchmark run chroma --no-filter

List available engines

uv run search-ann-benchmark list-engines

List available configurations

uv run search-ann-benchmark list-targets

Show configuration details

uv run search-ann-benchmark show-config qdrant --target 100k-768-m32-efc200-ef100-ip

View benchmark results

uv run search-ann-benchmark show-results results.json

Configuration Options

Target Configurations

Name	Index Size	HNSW M	Description
100k-768-m32-efc200-ef100-ip	100,000	32	Small dataset for quick testing
1m-768-m48-efc200-ef100-ip	1,000,000	48	Medium dataset
5m-768-m48-efc200-ef100-ip	5,000,000	48	Full dataset

Quantization Options

Different engines support different quantization modes:

Qdrant: none, int8
Elasticsearch: none, int4, int8, bbq
OpenSearch: none (supports faiss engine variant)
Weaviate: none, pq
pgvector: vector, halfvec

Project Structure

search-ann-benchmark/
├── src/search_ann_benchmark/
│   ├── __init__.py
│   ├── cli.py              # CLI entry point
│   ├── config.py           # Configuration classes
│   ├── runner.py           # Benchmark orchestration
│   ├── core/
│   │   ├── base.py         # Abstract engine interface
│   │   ├── docker.py       # Docker management
│   │   ├── embedding.py    # Embedding loader
│   │   └── metrics.py      # Metrics calculation
│   └── engines/
│       ├── qdrant.py
│       ├── elasticsearch.py
│       ├── opensearch.py
│       ├── milvus.py
│       ├── weaviate.py
│       ├── vespa.py
│       ├── pgvector.py
│       ├── chroma.py
│       ├── clickhouse.py
│       ├── lancedb.py
│       ├── redisstack.py
│       └── vald.py
├── tests/
├── scripts/
│   ├── setup.sh            # Dataset download
│   └── get_hardware_info.sh
└── .github/workflows/      # CI workflows

Output Format

Benchmark results are saved to results.json with the following structure:

{
  "variant": "",
  "target": "100k-768-m32-efc200-ef100-ip",
  "version": "1.13.6",
  "settings": { ... },
  "results": {
    "indexing": {
      "execution_time": 123.45,
      "process_time": 100.23,
      "container": { ... }
    },
    "top_10": {
      "num_of_queries": 10000,
      "took": { "mean": 5.2, "std": 1.1, ... },
      "hits": { ... },
      "precision": { "mean": 0.95, ... }
    },
    "top_100": { ... },
    "top_10_filtered": { ... },
    "top_100_filtered": { ... }
  },
  "timestamp": "2024-01-01T00:00:00"
}

Development

Running tests

uv run pytest

Code formatting

uv run ruff check --fix src tests
uv run ruff format src tests

Type checking

uv run mypy src

Updating Engine Versions

To update an engine version, modify the ENGINE_VERSION in:

The engine config class in src/search_ann_benchmark/engines/<engine>.py
The corresponding workflow in .github/workflows/run-<engine>-linux.yml

Benchmark Results

For a comparison of the results, including response times and precision metrics for different ANN algorithms, see Benchmark Results Page.

Contributing

We welcome contributions! If you have suggestions for additional benchmarks, improvements to existing ones, or fixes for any issues, please feel free to open an issue or submit a pull request.

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.github/workflows		.github/workflows
scripts		scripts
src/search_ann_benchmark		src/search_ann_benchmark
tests		tests
.gitignore		.gitignore
README.md		README.md
milvus-compose.yaml		milvus-compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Search ANN Benchmark

Introduction

Supported Engines

Prerequisites

Installation

Usage

Run a benchmark

List available engines

List available configurations

Show configuration details

View benchmark results

Configuration Options

Target Configurations

Quantization Options

Project Structure

Output Format

Development

Running tests

Code formatting

Type checking

Updating Engine Versions

Benchmark Results

Contributing

License

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

codelibs/search-ann-benchmark

Folders and files

Latest commit

History

Repository files navigation

Search ANN Benchmark

Introduction

Supported Engines

Prerequisites

Installation

Usage

Run a benchmark

List available engines

List available configurations

Show configuration details

View benchmark results

Configuration Options

Target Configurations

Quantization Options

Project Structure

Output Format

Development

Running tests

Code formatting

Type checking

Updating Engine Versions

Benchmark Results

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages