Skip to content

SpeciesFileGroup/indelligent

Repository files navigation

Indelligent

Reconstructs diploid DNA allelic sequences from mixed sequencing traces containing heterozygous insertions/deletions (indels). Uses dynamic programming to resolve IUPAC ambiguity codes into allele pairs.

Based on the algorithm by D.A. Dmitriev & R.A. Rakitov. Ported from the original Classic ASP/VBScript implementation.

Installation

Pre-built binaries

Download from the releases page for Linux (x86/arm64), macOS (x86/arm64), and Windows.

From source

Requires Go 1.24+.

go install github.com/SpeciesFileGroup/indelligent/indelligent@latest

Using just

git clone https://github.com/SpeciesFileGroup/indelligent.git
cd indelligent
just install

Usage

Analyze a single sequence

indelligent "TKGKKSCMW"

Output (FASTA, colored by default):

>allele1
.TGGTGCCAT
>allele2
TTGGTGCCA.
>combined
TTGGTGCCAT

Mismatches and non-standard IUPAC bases are highlighted in red. Use --no-color to disable.

Output formats

# FASTA (default)
indelligent "TKGKKSCMW"

# JSON (compact)
indelligent "TKGKKSCMW" -f compact

# JSON (pretty-printed)
indelligent "TKGKKSCMW" -f pretty

# Tab-separated values
indelligent "TKGKKSCMW" -f tsv

Add -d for detailed statistics in FASTA and TSV output.

Process many sequences from a file

One sequence per line:

indelligent sequences.txt > results.fasta

Or via stdin:

cat sequences.txt | indelligent

Stream mode

For lower memory usage with large files, use stream mode (-s) which processes one sequence at a time instead of batching:

indelligent -s sequences.txt

Web interface

Start the built-in web server:

indelligent -p 1977

Then open http://localhost:1977 in a browser. The web interface provides an interactive form matching the original ASP application, with color-highlighted results, simulation tools, and parameter controls.

REST API

When the web server is running:

# Health check
curl http://localhost:1977/api/v1/ping

# Version
curl http://localhost:1977/api/v1/version

# Analyze a single sequence
curl http://localhost:1977/api/v1/TKGKKSCMW

# Analyze multiple sequences
curl -X POST http://localhost:1977/api/v1/ \
  -H "Content-Type: application/json" \
  -d '{"sequences": ["TKGKKSCMW", "ATCRYSMK"]}'

Simulation mode

Generate a random fragment with known indels and analyze it:

indelligent --simulate --sim-length 100 --sim-indel 3

CLI Reference

Output and processing

Flag Description
-V, --version Show version and build info
-f, --format Output format: fasta (default), compact, pretty, tsv
-d, --details Include full statistics in output
-j, --jobs Worker threads (default: number of CPUs)
-p, --port Start web service on port
-s, --stream Stream mode instead of batch
-u, --unordered Allow unordered output (faster)
-b, --batch-size Max sequences per batch (default: 50000)
-q, --quiet Suppress progress logging
--no-color Disable colored output

Analysis parameters

Flag Description
-m, --max-shift Max phase shift size in bp (default: 15)
-P, --shift-penalty Shift change penalty (default: 2)
-x, --fix-shifts Restrict to specific shift magnitudes, comma-separated (e.g. 1,3)
-A, --align-alleles Align alleles with gap characters (default: true)
-a, --float-align Floating indel alignment: left or right (default: right)
-L, --long-indels Display longer indel variants

Simulation

Flag Description
--simulate Run in simulation mode
--sim-length Fragment length in bp (default: 10)
--sim-indel First indel size in bp (default: 1)
--sim-indel2 Second indel size in bp (default: 0)
--sim-allele Second indel allele: 1 or 2 (default: 2)
--sim-subs Number of substitutions (default: 0)

Go Library

Indelligent can be used as a Go library:

package main

import (
    "fmt"

    "github.com/SpeciesFileGroup/indelligent"
)

func main() {
    cfg := indelligent.NewConfig(
        indelligent.OptMaxShift(20),
        indelligent.OptAlignAlleles(true),
    )
    ind := indelligent.New(cfg)

    res := ind.Analyze("TKGKKSCMW")
    fmt.Println("Allele 1:", res.Allele1)
    fmt.Println("Allele 2:", res.Allele2)
    fmt.Println("Combined:", res.Combined)
    fmt.Printf("Resolved: %d/%d (%.0f%%)\n",
        res.Stats.ResolvedPositions, res.Stats.Length, res.Stats.ResolvedPct)
}

For batch processing:

seqs := []string{"TKGKKSCMW", "ATCRYSMK", "WWSSMMKK"}
results := ind.AnalyzeMany(seqs) // concurrent, order-preserved

Docker

# Build
just docker

# Run web server
docker run -p 1977:1977 sfgrp/indelligent

# Analyze a sequence
docker run sfgrp/indelligent "TKGKKSCMW"

Development

# Run tests
go test ./...

# Run tests with race detector
go test -race -count=1 ./...

# Build
just build

# Build release binaries for all platforms
just release

Algorithm

The analysis pipeline processes IUPAC-encoded mixed sequencing traces through six stages:

  1. Parse -- Normalize input and build the base-pair matrix (MX1)
  2. Score -- Forward/reverse dynamic programming to produce score matrices (MX2/MX3)
  3. Call -- Assign alleles and detect phase shifts from scores (MX4)
  4. Resolve -- Iteratively resolve ambiguous positions using 23+ decision rules
  5. Align -- Reposition gaps (floating indels) left or right
  6. Build -- Reconstruct allele sequences, aligned sequences, and consensus

License

MIT

References

Dmitriev D.A. & Rakitov R.A. (2008) Decoding of superimposed traces produced by direct sequencing of heterozygous indels. PLoS Computational Biology, 4(7): e1000113.

About

Reconstructs allelic sequences from mixed sequencing traces with heterozygous indels

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors