Metient Guide

Metient takes a clone tree and mutation data from one or more tumor sites as input, and infers where each clone originated to produce a migration history that describes how cancer spread between anatomical sites in a patient.

Start with a tutorial notebook for a hands-on walkthrough. Use this guide as a reference alongside it for understanding parameters, input formats, and interpreting outputs.

Step 1: Choose your function

Metient exposes four main functions. Pick yours based on two questions.

How many patients do you have? If you have a cohort of ~5 or more patients with the same cancer type, use Metient-calibrate — it learns optimal parsimony weights from your data. If you have fewer patients, or want to use pre-calibrated or custom weights, use Metient-evaluate.

Do you know which clones are present at which sites? Metient needs to know which mutation clusters (clones) are present at each anatomical site. If you have reference and variant read counts, Metient can estimate this for you. If you already have binary present/absent calls per clone per site, you can provide those directly (e.g. from single-cell data, or the output of your tree estimation).

	I have ref/var read counts and want Metient to estimate which clones are present at each site	I already know which clones are present at each site (binary present/absent per clone per site)
Evaluate (pre-set weights, any # of patients)	`met.evaluate()` — Tutorial 3	`met.evaluate_label_clone_tree()` — Tutorial 4
Calibrate (learn weights, cohort of ~5+)	`met.calibrate()` — Tutorial 1	`met.calibrate_label_clone_tree()` — Tutorial 2

Step 2: Prepare your input files

Each patient needs two files: a TSV with mutation/sample data and a TXT file with clone tree edges.

Tree TXT file

Each line is a space-separated edge parent_index child_index. Indices must match cluster_index in the TSV. Example.

TSV file

Which TSV format you need depends on which function you chose in Step 1.

Format 1a: Read count TSV

Used by: evaluate(), calibrate() — example

Each row = reference and variant read counts at a single locus in a single tumor sample.

Column	Description
anatomical_site_index	Zero-based index for anatomical_site_label. Rows with same site index and cluster_index are pooled.
anatomical_site_label	Name of the anatomical site
character_index	Zero-based index for character_label
character_label	Mutation name (short, no colons)
cluster_index	Cluster index, must match tree TXT file. Rows with same site index and cluster_index are pooled.
ref	Reference allele read count
var	Variant allele read count
site_category	`primary` or `metastasis`. Multiple primaries triggers one run per candidate.
var_read_prob	Variant read correction factor for copy number effects on variant allele frequency. See details below.

Format 1b: Known clone presence TSV

Used by: evaluate_label_clone_tree(), calibrate_label_clone_tree() — example

Each row = a single mutation/cluster in a single tumor sample.

Column	Description
anatomical_site_index	Zero-based index for anatomical_site_label. Rows with same site index and cluster_index are pooled.
anatomical_site_label	Name of the anatomical site
cluster_index	Cluster index, must match tree TXT file. Rows with same site index and cluster_index are pooled.
cluster_label	Mutation/cluster name (short, no colons)
present	`0` or `1` — whether this clone is present at this site
site_category	`primary` or `metastasis`. Multiple possible primaries triggers one run per candidate.
num_mutations	Number of mutations in this cluster

var_read_prob details

var_read_prob corrects for the effect that copy number alterations (CNAs) have on the relationship between variant allele frequency (VAF, the proportion of alleles with the mutation) and subclonal frequency (the proportion of cells with the mutation).

For a given mutation j, var_read_prob is the probability of observing a read from the variant allele in a cell that carries mutation j. This depends on how many copies of the reference vs. variant allele exist in that cell:

Scenario	var_read_prob	Why
Diploid locus, no CNAs	`0.5`	1 variant + 1 reference allele
Haploid locus (e.g., male sex chromosome), no CNAs	`1.0`	1 variant allele, no reference
CNA duplicated the reference allele before mutation j occurred	`0.333`	1 variant + 2 reference alleles

Using a copy number caller: If your CN caller reports major and minor copy number, see B.1 in Metient's supplementary information on how to compute var_read_prob.

Step 3: Configure weights

How you set weights depends on whether you're using evaluate or calibrate.

If using evaluate: pick or create weights

You can use a pre-calibrated preset:

Preset	How the weights were fit	Recommended for
`Weights.pancancer_genetic_organotropism_uniform_weighting()`	Genetic + organotropism, uniform cohort weighting	Human data (recommended)
`Weights.pancancer_genetic_uniform_weighting()`	Genetic only, uniform cohort weighting	Non-human data
`Weights.pancancer_genetic_cohort_size_weighting()`	Genetic only, weighted by cohort size
`Weights.pancancer_genetic_organotropism_cohort_size_weighting()`	Genetic + organotropism, weighted by cohort size

weights = met.Weights.pancancer_genetic_organotropism_uniform_weighting()

Or define custom weights:

weights = met.Weights(mig=0.5, comig=0.3, seed_site=0.2, gen_dist=0.01, organotrop=0.005)

Parameter	Default	Description
`mig`	—	Penalizes total migration number
`comig`	—	Penalizes co-migration number
`seed_site`	—	Penalizes number of seeding sites
`gen_dist`	`0.0`	Penalizes genetic distance (requires `num_mutations` in input)
`organotrop`	`0.0`	Penalizes deviation from organotropism priors

Tip

Use much higher penalties for parsimony weights (mig, comig, seed_site) than for gen_dist or organotrop. If using both genetic distance and organotropism, set both > 0.

If using calibrate: choose a calibration mode

Calibrate learns weights for you. The calibration_type parameter controls which objectives are used:

`calibration_type`	Description
`"genetic"`	Calibrates using genetic distance between clones
`"organotropism"`	Calibrates using organ-specific metastasis frequencies
`"both"`	Calibrates using both genetic distance and organotropism

Organotropism dictionaries (optional)

If using organotropism (in either evaluate or calibrate), provide a dictionary per patient mapping site names to metastasis frequencies. Values should be normalized (sum to 1).

Os = [
    {"Liver": 0.5, "Lung": 0.4, "Brain": 0.1},   # Patient 1
    {"Lymph": 0.7, "Bone": 0.3},                 # Patient 2
]

Tip

If you don't have organotropism frequencies for your cancer type, you can compute them from our MSK-MET metastasis counts table, derived from Nguyen et al.. This table has counts by cancer type × metastatic site. Make sure to normalize so the frequencies provided for each patient sum to 1. Note that your cancer type and the patient's tumor sites need to be mappable to the cancer types and sites in this table.

Step 4: Set run options

Performance parameters

Parameter	Default	What it does
`sample_size`	`-1` (auto)	How many solutions to explore in parallel per run. Limited by memory.
`num_runs`	`-1` (auto)	How many times to repeat the full algorithm. Runs are sequential, so doesn't cost extra memory.
`solve_polytomies`	`False`	Try resolving nodes with >2 children into binary splits. Slower, and not tested on trees beyond 100 nodes.

The defaults auto-calculate both sample_size and num_runs based on your tree size and number of sites. This is fine for trees <30 nodes.

sample_size and num_runs are the main drivers of runtime. Each run processes sample_size solutions in parallel, and Metient repeats this num_runs times sequentially. If Metient is running slower than expected, these are the knobs to turn down, but doing so may affect result quality.

If your results change significantly between runs, increase num_runs. If you want to override, set both explicitly — the total samples Metient considers is num_runs × sample_size.

Important

Use a GPU for trees with >100 nodes. Metient is much faster on GPU and large inputs may not be practical on CPU. If you run into GPU memory issues on really large trees, reduce sample_size and increase num_runs to compensate.

PrintConfig

Controls visualization and output saving:

print_config = met.PrintConfig(
    visualize=True,       # Visualize loss, best tree, and migration graph
    verbose=False,         # Print debug info
    k_best_trees=10,       # Number of solutions to visualize (all are saved to pkl)
    save_outputs=True,     # Save pngs and pickle files
    custom_colors=None,    # Array of hex strings (length = number of anatomical sites)
    display_labels=True    # Display node labels on migration history tree
)

Step 5: Run Metient

Evaluate (single patient)

import metient as met

weights = met.Weights.pancancer_genetic_organotropism_uniform_weighting()
print_config = met.PrintConfig(visualize=True, save_outputs=True, k_best_trees=5)

results = met.evaluate(
    tree_fn="path/to/tree.txt",
    tsv_fn="path/to/mutations.tsv",
    weights=weights,
    print_config=print_config,
    output_dir="./output",
    run_name="patient_1"
)

Calibrate (cohort)

import metient as met

print_config = met.PrintConfig(visualize=True, save_outputs=True, k_best_trees=5)

results = met.calibrate(
    tree_fns=["patient1_tree.txt", "patient2_tree.txt", "patient3_tree.txt"],
    tsv_fns=["patient1.tsv", "patient2.tsv", "patient3.tsv"],
    print_config=print_config,
    output_dir="./output",
    run_names=["patient_1", "patient_2", "patient_3"],
    calibration_type="both",
    Os=[{"Liver": 0.5, "Lung": 0.4, "Brain": 0.1},
        {"Lymph": 0.7, "Bone": 0.3},
        {"Liver": 0.6, "Lung": 0.4}]
)

Note

tree_fns[i], tsv_fns[i], run_names[i], and Os[i] must all correspond to patient i.

Full function signatures

met.evaluate(tree_fn, tsv_fn, weights, print_config, output_dir, run_name,
             O=None, sample_size=-1, solve_polytomies=False, num_runs=5)

met.evaluate_label_clone_tree(tree_fn, tsv_fn, weights, print_config, output_dir, run_name,
                              O=None, sample_size=-1, solve_polytomies=False, num_runs=5)

met.calibrate(tree_fns, tsv_fns, print_config, output_dir, run_names, calibration_type,
              Os=None, sample_size=-1, solve_polytomies=False, num_runs=5)

met.calibrate_label_clone_tree(tree_fns, tsv_fns, print_config, output_dir, run_names, calibration_type,
                               Os=None, sample_size=-1, solve_polytomies=False, num_runs=5)

Step 6: Interpret your results

Metient outputs a pickle file per patient in your output_dir.

Quick inspection

import pickle, gzip
import metient as met

with gzip.open("output/patient_1.pkl.gz", "rb") as f:
    pkl = pickle.load(f)

print(met.weighted_seeding_pattern(pkl))
print(met.weighted_phyleticity(pkl))
print(met.weighted_site_clonality(pkl))
print(met.weighted_genetic_clonality(pkl))

Pickle file contents

Key	Description
anatomical_sites	List of sites in the order used for all matrices below
node_info	List of dicts (best → worst solution). Maps node index → `(label, is_leaf, is_polytomy_resolver_node)`. Leaf nodes added by Metient are labeled `<parent>_<site>`.
node_labels	List of arrays (best → worst solution). Shape `(num_sites, num_nodes)`. Each column is a one-hot site assignment.
parents	List of 1-D arrays (best → worst solution). `parents[i]` = parent of node i; root = `-1`.
observed_proportions	Array `(num_sites, num_clusters)`. Values > 0.05 = clone present at site.
losses	List of losses, best → worst solution
probabilities	List of probabilities, best → worst solution
primary_site	Anatomical site used as primary
loss_info	List of dicts with unweighted loss components (migration number, comigration number, etc.)

Analysis utilities

These functions help characterize the migration history. The weighted variants (recommended) operate on the full pickle dict and weight across all solutions by probability:

Function	Returns
`met.weighted_seeding_pattern(pkl)`	Weighted seeding pattern
`met.weighted_site_clonality(pkl)`	Weighted site clonality
`met.weighted_genetic_clonality(pkl)`	Weighted genetic clonality
`met.weighted_phyleticity(pkl, sites=None)`	Weighted phyleticity (optionally restricted to specific sites)

For single-solution analysis, extract matrices from the pickle and use these functions (V = vertex labeling matrix, A = adjacency matrix, node_info = dict from pickle):

Function	Returns
`met.migration_graph(V, A)`	Migration graph
`met.seeding_pattern(V, A)`	{primary single-source, single-source, multi-source, reseeding}
`met.site_clonality(V, A)`	Monoclonal / polyclonal
`met.genetic_clonality(V, A, node_info)`	Monoclonal / polyclonal
`met.phyleticity(V, A, node_info)`	Monophyletic / polyphyletic
`met.seeding_clusters(V, A, node_info)`	Nodes whose parent has a different site
`met.adjacency_matrix_from_parents(parents)`	Sparse adjacency matrix from parents vector

# Example: per-solution analysis
A = met.adjacency_matrix_from_parents(pkl["parents"][0])
V = pkl["node_labels"][0]
node_info = pkl["node_info"][0]

print(met.seeding_pattern(V, A))
print(met.phyleticity(V, A, node_info))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metient Guide

Step 1: Choose your function

Step 2: Prepare your input files

Tree TXT file

TSV file

Format 1a: Read count TSV

Format 1b: Known clone presence TSV

var_read_prob details

Step 3: Configure weights

If using evaluate: pick or create weights

If using calibrate: choose a calibration mode

Organotropism dictionaries (optional)

Step 4: Set run options

Performance parameters

PrintConfig

Step 5: Run Metient

Evaluate (single patient)

Calibrate (cohort)

Full function signatures

Step 6: Interpret your results

Quick inspection

Pickle file contents

Analysis utilities

FilesExpand file tree

guide.md

Latest commit

History

guide.md

File metadata and controls

Metient Guide

Step 1: Choose your function

Step 2: Prepare your input files

Tree TXT file

TSV file

Format 1a: Read count TSV

Format 1b: Known clone presence TSV

var_read_prob details

Step 3: Configure weights

If using evaluate: pick or create weights

If using calibrate: choose a calibration mode

Organotropism dictionaries (optional)

Step 4: Set run options

Performance parameters

PrintConfig

Step 5: Run Metient

Evaluate (single patient)

Calibrate (cohort)

Full function signatures

Step 6: Interpret your results

Quick inspection

Pickle file contents

Analysis utilities