STARLING - prediction of disordered protein ensembles from sequence

About

Last updated April 15th 2026

STARLING (conSTruction of intrinsicAlly disoRdered proteins ensembles efficientLy vIa multi-dimeNsional Generative models) is a latent-space probabilistic denoising diffusion model for predicting coarse-grained ensembles of intrinsically disordered regions.

STARLING was developed by Borna Novak and Jeff Lotthammer in the Holehouse lab (with some occasional help from Ryan and Alex, as is their wont).

For more information, please take a look at our paper!

Novak, B., Lotthammer, J. M., Emenecker, R. J. & Holehouse, A. S. Accurate predictions of disordered protein ensembles with STARLING. Nature 652, 240–250 (2026).

Documentation

Detailed documentation is provdied on readthedocs, although this readme is probably enough to do most things.

https://idptools-starling.readthedocs.io/en/latest/

Colab notebook

A Google Colab notebook for predicting ensembles and performing rudimentary analysis is available here.

Installation

STARLING is available on GitHub (bleeding edge) and on PyPi (stable).

We recommend creating a fresh conda environment for STARLING (although in principle, there's nothing special about the STARLING environment)

conda create -n starling  python=3.11 -y
conda activate starling

You can then install STARLING from PyPI using pip (or uv):

pip install idptools-starling

Or you can clone and install the bleeding-edge version from GitHub:

pip install git+https://github.com/idptools/starling.git

To check that STARLING has been installed correctly, run

starling --help

A Docker image is also available — see the Docker documentation for details.

Quickstart

The easiest way to use STARLING for ensemble generation is with the starling command-line tool.

starling <amino acid sequence> -c 400 --outname my_cool_idr -r

Example:

starling MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA --outname synuclein -r

Will generate three files:

synuclein.starling — the full STARLING ensemble file. This holds all the information associated with the ensemble.
synuclein_STARLING.pdb — the topology file for the ensemble.
synuclein_STARLING.xtc — the trajectory file for the ensemble.

By default, STARLING generates 400 conformations — to change the number of conformations, use the -c flag (e.g., -c 1000 would generate an ensemble with 1000 conformations).

Performance

STARLING is VERY fast on GPUs and — honestly — VERY fast on Apple Silicon as well. It is a bit slower on CPUs, but we're talking minutes instead of seconds for ensemble generation.

Command-Line Interface (CLI)

STARLING installs several command-line tools. Below is a complete reference for all of them.

`starling` — Ensemble Generation

The main CLI tool. Generates conformational ensembles from amino acid sequences.

starling <input> [options]

Input formats: a raw amino acid sequence, a .fasta file, a .tsv file (name<TAB>sequence), or a .seq.in file.

Examples:

# Single sequence, 400 conformations with 3D structures
starling MKVIFLAVLGLGIVVTTVLY -c 400 -r --outname my_protein

# From a FASTA file, using GPU
starling proteins.fasta -c 200 -d cuda:0 -r -o ./results

# Print STARLING configuration info
starling --info

Options

Flag	Type	Default	Description
`user_input`	positional	—	Sequence string, FASTA file, TSV file, or `.seq.in` file
`-c, --conformations`	int	400	Number of conformations to generate
`--steps`	int	30	Number of DDIM denoising steps
`-d, --device`	str	auto	Device: `cpu`, `cuda:0`, `cuda:1`, `mps`, etc.
`-b, --batch_size`	int	100	Batch size for sampling
`-o, --output_directory`	str	`.`	Output directory for saving results
`--outname`	str	auto	Override output filename prefix (single sequence only)
`-r, --return_structures`	flag	off	Generate PDB + XTC 3D structures
`--ionic_strength`	int	150	Solvent ionic strength in mM (20, 150, or 300)
`--num-cpus`	int	auto	Max CPUs for MDS reconstruction
`--num-mds-init`	int	4	Number of parallel MDS initializations
`-v, --verbose`	flag	off	Enable verbose output
`--disable_progress_bar`	flag	off	Hide progress bars
`--info`	flag	—	Print STARLING configuration and exit
`--version`	flag	—	Print version and exit

Output files

File	Description
`*.starling`	Binary ensemble archive (distance maps + metadata)
`*_STARLING.pdb`	PDB topology (when `-r` is used)
`*_STARLING.xtc`	XTC trajectory with all conformations (when `-r` is used)

`starling-benchmark` — Performance Benchmarking

Profile model throughput and measure performance across different configurations.

starling-benchmark [options]

Examples:

# Default benchmark sweep (10 to 1000 conformations)
starling-benchmark --device cuda:0

# Single run with 500 conformations and model compilation
starling-benchmark --device cuda:0 --single-run 500 --compile

Options

Flag	Type	Default	Description
`--device`	str	auto	Device for benchmarking
`--batch-size`	int	100	Batch size
`--steps`	int	30	Diffusion steps
`--sequence`	str	alpha-synuclein	Test sequence (default: 140 aa)
`--cooltime`	int	20	Cooldown seconds between runs
`--single-run`	int	0	Single test with N conformations (0 = sweep series)
`--compile`	flag	off	Enable PyTorch model compilation (CUDA only)

File Conversion Tools

STARLING ships with several converters for working with .starling ensemble archives.

`starling2pdb` — Convert to PDB

starling2pdb my_ensemble.starling -o ./output

Generates a multi-model PDB trajectory file.

`starling2xtc` — Convert to XTC

starling2xtc my_ensemble.starling -o ./output

Generates a PDB topology file plus a compressed XTC trajectory file.

`starling2numpy` — Convert to NumPy

starling2numpy my_ensemble.starling -o ./output

Exports the raw distance maps as a NumPy .npy array with shape (n_conformations, n_residues, n_residues).

`starling2sequence` — Print sequence

starling2sequence my_ensemble.starling

Prints the amino acid sequence stored in the .starling archive to stdout.

`starling2info` — Print ensemble metadata

starling2info my_ensemble.starling

Displays metadata about the ensemble, including creation date, sequence, number of conformations, radius of gyration, end-to-end distance, and model weights used.

`starling2starling` — Repair/validate an archive

# Check for errors
starling2starling my_ensemble.starling --error-check

# Check and remove problematic conformations
starling2starling my_ensemble.starling --error-check --remove-errors -o fixed_

# Overwrite the original file
starling2starling my_ensemble.starling --error-check --remove-errors --overwrite

`numpy2starling` — Restore from NumPy

numpy2starling distance_maps.npy -s MKVIFLAVLGLGIVVTTVLY -o ./output

Converts a NumPy distance map array and a sequence back into a .starling archive. Supports optional --build-structures to reconstruct 3D coordinates, and -x / -p to attach existing XTC/PDB trajectories.

`xtc2starling` — Convert XTC trajectory to STARLING

xtc2starling --xtc trajectory.xtc --pdb topology.pdb -o ./output

Converts an existing XTC trajectory and PDB topology into a .starling archive.

Converter summary

Command	Input	Output	Description
`starling2pdb`	`.starling`	`.pdb`	Multi-model PDB trajectory
`starling2xtc`	`.starling`	`.pdb` + `.xtc`	Topology + compressed trajectory
`starling2numpy`	`.starling`	`.npy`	Raw distance maps as NumPy array
`starling2sequence`	`.starling`	stdout	Print amino-acid sequence
`starling2info`	`.starling`	stdout	Print metadata (version, date, Rg, etc.)
`starling2starling`	`.starling`	`.starling`	Re-save with optional error removal
`numpy2starling`	`.npy`	`.starling`	Restore archive from NumPy
`xtc2starling`	`.xtc` + `.pdb`	`.starling`	Convert MD trajectory to STARLING

`starling-search` — Sequence Search

STARLING includes a FAISS-based similarity search engine that uses ensemble-aware sequence embeddings. It has two subcommands: build and query.

`starling-search query` — Find similar sequences

Search the pre-built FAISS index for sequences with similar ensemble properties.

starling-search query \
  --seq MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKT \
  --k 20 \
  --nprobe 128 \
  --exclude-exact \
  --out search_results

# With filtering by sequence identity and length
starling-search query \
  --seq MKVIFLAVLGLGIVVTTVLY \
  --k 50 \
  --sequence-identity-max 0.9 \
  --length-min 40 \
  --length-max 800 \
  --rerank \
  --out-format csv \
  --out filtered_results

Query options

Flag	Type	Default	Description
`--index`	str	`default`	FAISS index path; `default` auto-downloads the pre-built index
`--seq`	str	—	Query sequence(s), can be specified multiple times
`--k`	int	10	Number of nearest neighbors to return
`--nprobe`	int	64	FAISS probe count (higher = slower but more accurate)
`--metric`	str	`cosine`	Distance metric: `cosine` or `l2`
`--exclude-exact`	flag	on	Skip exact sequence matches in results
`--sequence-identity-max`	float	—	Maximum sequence identity threshold
`--identity-denominator`	str	`query`	How to compute identity: `query`, `target`, `max`, `min`, `avg`
`--length-min`	int	—	Minimum target sequence length
`--length-max`	int	—	Maximum target sequence length
`--max-cosine-similarity`	float	—	Pre-filter upper bound on cosine similarity
`--min-l2-distance`	float	—	Pre-filter lower bound on L2 distance
`--rerank`	flag	on	Re-embed top hits with full encoder for more accurate ranking
`--rerank-batch-size`	int	64	Batch size for reranking
`--rerank-device`	str	auto	Device for reranking
`--rerank-ionic-strength`	int	auto	Ionic strength for reranking
`--device`	str	`cuda:0`	Device for query embedding
`--batch-size`	int	256	Batch size for embedding
`--ionic-strength`	int	150	Ionic strength in mM for encoding
`-o, --out`	str	`nearest_neighbors`	Output file basename
`--out-format`	str	`csv`	Output format: `csv` or `jsonl`
`--verbose`	flag	on	Verbose logging

`starling-search build` — Build a custom FAISS index

Build a FAISS index from pre-tokenized sequences (advanced usage).

starling-search build \
  --root /data/corpus \
  --tokens /data/corpus/tokens \
  --index /indexes/my_index.faiss \
  --sample-size 1000000 \
  --nlist 32768 \
  --use-gpu

Build options

Flag	Type	Default	Description
`--root`	str	required	Root data directory
`--index`	str	required	Output FAISS index path
`--tokens`	str	required	Directory with pre-tokenized sequences
`--metric`	str	`cosine`	Distance metric: `cosine` or `l2`
`--sample-size`	int	655360	Training sample size
`--nlist`	int	16384	FAISS IVF nlist parameter
`--m`	int	64	HNSW M parameter
`--nbits`	int	8	Quantization bits
`--add-batch-size`	int	100000	Batch size for adding vectors
`--nprobe`	int	16	FAISS probe count
`--use-gpu`	flag	on	Use GPU for index building
`--gpu-device`	int	0	GPU device ID
`--gpu-fp16-lut`	flag	on	Use FP16 lookup tables on GPU
`--opq`	flag	off	Enable Optimized Product Quantization
`--compress`	flag	off	Compress sequences
`--shard-regex`	str	—	Regex filter for shard files
`--verbose`	flag	on	Verbose output

`starling-pretokenize` — Pre-tokenize Sequences

Pre-encode FASTA files for rapid FAISS index construction (used before starling-search build).

starling-pretokenize sequences/*.fasta \
  --output tokens_dir \
  --combined \
  --workers 4

Options

Flag	Type	Default	Description
`fastas`	positional	—	Input FASTA file(s)
`-o, --output`	str	required	Output directory for token files
`--combined`	flag	off	Merge all into a single `.pt` file
`--prefix`	str	`pretokenized`	Prefix for combined output file
`--sequences`	str	—	Text file with FASTA paths (one per line)
`--workers`	int	1	Number of parallel tokenizer workers
`--no-progress`	flag	off	Hide progress bars

Training CLIs (Advanced)

These tools are primarily used for model development and retraining.

Command	Description
`starling-vae-train`	Train the VAE encoder model
`starling-ddpm-train`	Train the diffusion model
`starling-sample`	Generate samples from the VAE
`ae-train`	Train the autoencoder

Python Library

As well as the command-line tools, STARLING provides a powerful Python API for generating and analyzing ensembles programmatically.

Supported input formats

All main API functions accept sequences in multiple formats:

Format	Example
Single sequence string	`'MKVIFLAVLGLGIVVTTVLY'`
List of sequences	`['MKVIFLA...', 'MDVFMKG...']`
Dictionary of name→sequence	`{'protein_a': 'MKVIFLA...', 'protein_b': 'MDVFMKG...'}`
Path to a `.fasta` file	`'proteins.fasta'`
Path to a `.tsv` / `.seq.in` file	`'sequences.tsv'` (tab-separated `name\tsequence`)

`generate()` — Generate Ensembles

The generate function is the main entry point for generating conformational ensembles using the STARLING model. It accepts various input types, generates conformations using DDIM/DDPM, and optionally returns 3D structures.

from starling import generate

Basic usage

# Single sequence → single Ensemble object
E = generate('MKVIFLAVLGLGIVVTTVLY', return_single_ensemble=True)

# List of sequences → dict of Ensemble objects
E_dict = generate(['MKVIFLAVLGLGIVVTTVLY', 'MDVFMKGLSKAKEGVVAAAEKTKQGVAE'])

# Dictionary of sequences → dict of Ensemble objects
E_dict = generate({'seq1': 'MKVIFLAVLGLGIVVTTVLY', 'seq2': 'MDVFMKGLSKAKEGVVAAAEKTKQGVAE'})

# From a FASTA file, with 3D structures, saved to disk
E_dict = generate('proteins.fasta', conformations=500, return_structures=True, output_directory='./results')

Parameters

Parameter	Type	Default	Description
`user_input`	str / list / dict	—	Input sequences (see supported formats above)
`conformations`	int	400	Number of conformations to generate
`ionic_strength`	int	150	Solvent ionic strength in mM (20, 150, or 300)
`device`	str	`None` (auto)	Device: `'cpu'`, `'cuda:0'`, `'mps'`, etc.
`steps`	int	30	Number of denoising steps
`sampler`	str	`'ddim'`	Sampler backend
`return_structures`	bool	`False`	Generate 3D structures (PDB/XTC)
`batch_size`	int	100	Batch size for sampling
`num_cpus_mds`	int	auto	Max CPUs for MDS reconstruction
`num_mds_init`	int	4	Number of parallel MDS initializations
`output_directory`	str	`None`	Save directory (if set, writes `.starling` files to disk)
`output_name`	str	`None`	Override filename prefix (single-sequence mode)
`return_data`	bool	`True`	Return `Ensemble` objects (set `False` for fire-and-forget disk saves)
`verbose`	bool	`False`	Print status messages
`show_progress_bar`	bool	`True`	Show global progress bar
`show_per_step_progress_bar`	bool	`True`	Show per-step denoising progress bar
`pdb_trajectory`	bool	`False`	Save PDB trajectory alongside XTC
`return_single_ensemble`	bool	`False`	Return a single `Ensemble` instead of a dict (single-sequence mode)
`constraint`	Constraint	`None`	Constraint object for guided generation
`encoder_path`	str	`None`	Custom encoder model checkpoint
`ddpm_path`	str	`None`	Custom diffusion model checkpoint

Returns

dict[str, Ensemble] — by default (one entry per input sequence)
Ensemble — when return_single_ensemble=True and a single sequence is provided
None — when return_data=False

`sequence_encoder()` — Ensemble-Aware Sequence Embeddings

STARLING jointly trains a transformer-based sequence encoder that produces embeddings optimized for ensemble generation. Sequences with similar ensemble properties tend to have similar embeddings, making them useful for search and design applications.

from starling import sequence_encoder

Basic usage

# Residue-level embeddings (returns dict of name → tensor with shape (L, D))
embeddings = sequence_encoder('proteins.fasta')

# Protein-level embeddings via mean pooling
embeddings = sequence_encoder('proteins.fasta', aggregate=True)

# With custom settings
embeddings = sequence_encoder(
    {'prot_a': 'MKVIFLA...', 'prot_b': 'MDVFMKG...'},
    ionic_strength=150,
    batch_size=64,
    aggregate=True,
    device='cuda:0',
)

Parameters

Parameter	Type	Default	Description
`sequence_dict`	str / list / dict	—	Input sequences (same formats as `generate()`)
`ionic_strength`	int	150	Ionic strength in mM
`batch_size`	int	32	Sequences per batch
`aggregate`	bool	`False`	Return protein-level (mean-pooled) embeddings instead of residue-level
`device`	str	`None` (auto)	Target device
`output_directory`	str	`None`	Optional directory to save embeddings
`encoder_path`	str	`None`	Custom encoder checkpoint
`ddpm_path`	str	`None`	Custom diffusion model checkpoint
`pretokenized`	bool	`False`	Skip tokenization if inputs are already tokenized
`bucket`	bool	`False`	Adaptive bucketing by sequence length (improves throughput for variable-length inputs)
`bucket_size`	int	32	Max unique lengths per bucket
`free_cuda_cache`	bool	`False`	Release CUDA memory after each batch
`return_on_cpu`	bool	`True`	Move tensors to CPU before returning

Returns

dict[str, torch.Tensor] — keys are sequence names, values are tensors with shape (L, D) (residue-level) or (D,) (aggregated)

`load_ensemble()` — Load a Saved Ensemble

Reload a previously generated and saved STARLING ensemble from disk.

from starling import load_ensemble

ensemble = load_ensemble('path/to/my_favorite_ensemble.starling')

# Load without 3D structures (faster)
ensemble = load_ensemble('my_ensemble.starling', ignore_structures=True)

Parameters

Parameter	Type	Default	Description
`filename`	str	—	Path to a `.starling` file
`ignore_structures`	bool	`False`	Skip loading 3D structures for faster loading

Returns

Ensemble object

`set_compilation_options()` — PyTorch Model Compilation

If you intend to use STARLING repeatedly (e.g., in loops or batch processing), enable torch.compile to optimize model kernels. This adds overhead during the first call but improves subsequent runs by approximately 40% (tested on NVIDIA A5000).

import starling

# Enable compilation
starling.set_compilation_options(enabled=True)

# Enable with custom options
starling.set_compilation_options(
    enabled=True,
    mode='max-autotune',
    backend='inductor',
    fullgraph=True,
)

Parameters

Parameter	Type	Default	Description
`enabled`	bool	`None`	Enable or disable compilation
`mode`	str	`'default'`	Compilation mode: `'default'`, `'reduce-overhead'`, `'max-autotune'`
`backend`	str	`'inductor'`	Compilation backend
`fullgraph`	bool	`True`	Compile full graph
`dynamic`	bool	`None`	Handle dynamic shapes

Returns

dict with the current compilation settings

`Ensemble` Class

The Ensemble class represents an ensemble of conformations for a protein chain. It stores distance maps from which all structural parameters can be derived.

Properties

Property	Type	Description
`.sequence`	str	Amino acid sequence
`.number_of_conformations`	int	Total number of conformations
`.sequence_length`	int	Number of residues
`.has_structures`	bool	Whether 3D structures are available
`.trajectory`	SSProtein	3D trajectory object (lazy-built on first access)

Structural analysis methods

`.rij()` — Inter-residue distance

Ensemble.rij(i, j, return_mean=False, use_bme_weights=False)

Returns the distance between residues i and j across all conformations, or the mean distance if return_mean=True.

`.end_to_end_distance()` — End-to-end distance

Ensemble.end_to_end_distance(return_mean=False, use_bme_weights=False)

Returns the end-to-end distance across all conformations, or the mean.

`.radius_of_gyration()` — Radius of gyration

Ensemble.radius_of_gyration(return_mean=False, force_recompute=False, use_bme_weights=False)

Returns the radius of gyration across all conformations, or the mean.

`.hydrodynamic_radius()` — Hydrodynamic radius

Ensemble.hydrodynamic_radius(return_mean=False, force_recompute=False, mode='nygaard', alpha1=0.216, alpha2=4.06, alpha3=0.821)

Computes the hydrodynamic radius from the ensemble.

`.local_radius_of_gyration()` — Local Rg for a sub-region

Ensemble.local_radius_of_gyration(start, end, return_mean=False, use_bme_weights=False)

Returns the radius of gyration for a sub-region defined by residues start to end.

`.distance_maps()` — Pairwise distance maps

Ensemble.distance_maps(return_mean=False, use_bme_weights=False)

Returns the raw distance maps as (n, L, L) NumPy arrays, or the average distance map if return_mean=True.

`.contact_map()` — Contact maps

Ensemble.contact_map(contact_thresh=11, return_mean=False, return_summed=False)

Returns binary contact maps using a distance threshold. If return_mean=True, returns the contact probability (0–1) for each residue pair. If return_summed=True, returns summed contacts instead.

3D structure reconstruction

`.build_ensemble_trajectory()`

Ensemble.build_ensemble_trajectory(
    batch_size=100,
    num_cpus_mds=configs.DEFAULT_CPU_COUNT_MDS,
    num_mds_init=configs.DEFAULT_MDS_NUM_INIT,
    device=None,
    force_recompute=False,
    progress_bar=True,
)

Reconstructs 3D coordinates from distance maps using multidimensional scaling (MDS). Returns an SSProtein trajectory object.

Error checking

`.check_for_errors()`

Ensemble.check_for_errors(remove_errors=False, verbose=True, rebuild_trajectory=False)

Scans for problematic conformations (e.g., impossible distances). Returns a list of bad frame indices. If remove_errors=True, removes them in place.

Bayesian Maximum Entropy (BME) reweighting

`.reweight_bme()`

Ensemble.reweight_bme(experimental_data, ensemble_properties, weights=None, verbose=True)

Performs BME reweighting against experimental data. After reweighting, structural property methods accept use_bme_weights=True for reweighted statistics.

File I/O

`.save()` — Save an ensemble to disk

Ensemble.save(filename_prefix, compress=False, reduce_precision=None, compression_algorithm='lzma', verbose=True)

Saves the ensemble as a .starling archive.

`.save_trajectory()` — Save 3D trajectory

Ensemble.save_trajectory(filename_prefix, pdb_trajectory=False)

Saves the 3D trajectory as XTC (or PDB if pdb_trajectory=True).

Constrained Generation

STARLING allows you to generate structural ensembles with constraints — such as experimentally measured distances or local/global shape features. These are passed to generate() via the constraint parameter.

Available constraint types

from starling.inference.constraints import (
    DistanceConstraint,
    RgConstraint,
    ReConstraint,
    HelicityConstraint,
    BondConstraint,
    StericClashConstraint,
    MultiConstraint,
)

`DistanceConstraint` — target distance between two residues

constraint = DistanceConstraint(resid1=10, resid2=200, target=50)

`RgConstraint` — target radius of gyration

constraint = RgConstraint(target=50)

`ReConstraint` — target end-to-end distance

constraint = ReConstraint(target=100)

`HelicityConstraint` — enforce helical structure in a range

constraint = HelicityConstraint(resid_start=10, resid_end=100)

`BondConstraint` — maintain consecutive residue spacing

constraint = BondConstraint(bond_length=3.81)

`StericClashConstraint` — prevent steric clashes

constraint = StericClashConstraint(steric_clash_definition=5.0)

`MultiConstraint` — combine multiple constraints

constraint = MultiConstraint([
    DistanceConstraint(resid1=10, resid2=200, target=50),
    RgConstraint(target=30),
])

Applying constraints

ensemble = generate(sequence, constraint=constraint)

Tuning constraint parameters

All constraints accept the following keyword arguments:

Parameter	Type	Default	Description
`force_constant`	float	2.0	Strength of the constraint
`tolerance`	float	0.0	Tolerance around the target value
`schedule`	str	`'cosine'`	Weight schedule: `'cosine'` or `'bell_shaped'`
`guidance_start`	float	0.0	When to start applying the constraint (0.0 = start of denoising)
`guidance_end`	float	1.0	When to stop applying the constraint (1.0 = end of denoising)

Guidance timing reference:

Window	`guidance_start`	`guidance_end`	What's being denoised
Early	0.0	0.3	Mostly noise, minimal structural information
Mid	0.3	0.7	Emerging structure, useful features begin to form
Late	0.7	1.0	Fine details, near-final structural refinement

Experimenting with these parameters for your particular application is recommended.

FAQs/Help

I get a NumPy compilation warning error!?

Oh no! You get the following error message:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

We have seen this if folks are trying to install on Intel Macs because (Py)Torch stopped supporting Intel Macs after torch=2.2.2. If you're NOT on an Intel mac, the recommended way to resolve us by upgrading torch:

# recommended, but ANY version above 2.2.2 should work
pip install torch==2.6.0

or if you're on an Intel mac and torch > 2.2.2 is not available, downgrade numpy:

pip install numpy==1.26.1

Potential PyTorch / CUDA version issues

If you are on an older version of CUDA, a torch version that does not have the correct CUDA version will be installed. This can cause a segfault when running STARLING. To fix this, you need to install torch for your specific CUDA version. For example, to install PyTorch on Linux using pip with a CUDA version of 12.1, you would run:

pip install torch --index-url https://download.pytorch.org/whl/cu121

To figure out which version of CUDA you currently have (assuming you have a CUDA-enabled GPU that is set up correctly), you need to run:

nvidia-smi

This should return information about your GPU, NVIDIA driver version, and your CUDA version at the top.

Please see the PyTorch install instructions for more info.

Maximum sequence length

STARLING currently supports sequences up to 380 residues in length.

Name		Name	Last commit message	Last commit date
Latest commit History 723 Commits
demos		demos
devtools		devtools
docker		docker
docs		docs
logo		logo
starling		starling
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
changelog.md		changelog.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
setup.cfg		setup.cfg
starling.jpg		starling.jpg
starling_logo-1.png		starling_logo-1.png

Folders and files

Latest commit

History

Repository files navigation

STARLING - prediction of disordered protein ensembles from sequence

About

Last updated April 15th 2026

Documentation

Colab notebook

Installation

Quickstart

Performance

Command-Line Interface (CLI)

starling — Ensemble Generation

Options

Output files

starling-benchmark — Performance Benchmarking

Options

File Conversion Tools

starling2pdb — Convert to PDB

starling2xtc — Convert to XTC

starling2numpy — Convert to NumPy

starling2sequence — Print sequence

starling2info — Print ensemble metadata

starling2starling — Repair/validate an archive

numpy2starling — Restore from NumPy

xtc2starling — Convert XTC trajectory to STARLING

Converter summary

starling-search — Sequence Search

starling-search query — Find similar sequences

Query options

starling-search build — Build a custom FAISS index

Build options

starling-pretokenize — Pre-tokenize Sequences

Options

Training CLIs (Advanced)

Python Library

Supported input formats

generate() — Generate Ensembles

Basic usage

Parameters

Returns

sequence_encoder() — Ensemble-Aware Sequence Embeddings

Basic usage

Parameters

Returns

load_ensemble() — Load a Saved Ensemble

Parameters

Returns

set_compilation_options() — PyTorch Model Compilation

Parameters

Returns

Ensemble Class

Properties

Structural analysis methods

.rij() — Inter-residue distance

.end_to_end_distance() — End-to-end distance

.radius_of_gyration() — Radius of gyration

.hydrodynamic_radius() — Hydrodynamic radius

.local_radius_of_gyration() — Local Rg for a sub-region

.distance_maps() — Pairwise distance maps

.contact_map() — Contact maps

3D structure reconstruction

.build_ensemble_trajectory()

Error checking

.check_for_errors()

Bayesian Maximum Entropy (BME) reweighting

.reweight_bme()

File I/O

.save() — Save an ensemble to disk

.save_trajectory() — Save 3D trajectory

Constrained Generation

Available constraint types

DistanceConstraint — target distance between two residues

RgConstraint — target radius of gyration

ReConstraint — target end-to-end distance

HelicityConstraint — enforce helical structure in a range

BondConstraint — maintain consecutive residue spacing

StericClashConstraint — prevent steric clashes

MultiConstraint — combine multiple constraints

`starling` — Ensemble Generation

`starling-benchmark` — Performance Benchmarking

`starling2pdb` — Convert to PDB

`starling2xtc` — Convert to XTC

`starling2numpy` — Convert to NumPy

`starling2sequence` — Print sequence

`starling2info` — Print ensemble metadata

`starling2starling` — Repair/validate an archive

`numpy2starling` — Restore from NumPy

`xtc2starling` — Convert XTC trajectory to STARLING

`starling-search` — Sequence Search

`starling-search query` — Find similar sequences

`starling-search build` — Build a custom FAISS index

`starling-pretokenize` — Pre-tokenize Sequences

`generate()` — Generate Ensembles

`sequence_encoder()` — Ensemble-Aware Sequence Embeddings

`load_ensemble()` — Load a Saved Ensemble

`set_compilation_options()` — PyTorch Model Compilation

`Ensemble` Class

`.rij()` — Inter-residue distance

`.end_to_end_distance()` — End-to-end distance

`.radius_of_gyration()` — Radius of gyration

`.hydrodynamic_radius()` — Hydrodynamic radius

`.local_radius_of_gyration()` — Local Rg for a sub-region

`.distance_maps()` — Pairwise distance maps

`.contact_map()` — Contact maps

`.build_ensemble_trajectory()`

`.check_for_errors()`

`.reweight_bme()`

`.save()` — Save an ensemble to disk

`.save_trajectory()` — Save 3D trajectory

`DistanceConstraint` — target distance between two residues

`RgConstraint` — target radius of gyration

`ReConstraint` — target end-to-end distance

`HelicityConstraint` — enforce helical structure in a range

`BondConstraint` — maintain consecutive residue spacing

`StericClashConstraint` — prevent steric clashes

`MultiConstraint` — combine multiple constraints

Packages