SLOTH – Structural Loader with On-demand Traversal Handling

Lazy by design. Fast by default.

SLOTH is a fast, flexible mmCIF parser for structural biology workflows. Built on the C++ gemmi backend, it performs eager parsing and lazy object construction — efficient for both large-scale pipelines and interactive exploration.

High-speed parsing via gemmi
Lazy construction of row and item objects for memory efficiency
Pythonic dot-notation access to mmCIF data
Multi-level validation — MMCIFValidator().validate() runs the full mmCIF dictionary + wwPDB rule suite and returns a ValidationReport
Schema-aware warnings — unknown categories/items trigger SchemaWarning with "Did you mean …?" suggestions
Tab completion & fuzzy matching — __dir__() exposes item/category/block names; typos produce helpful AttributeError messages
Pluggable validation with cross-category support and model-level plugin registration
JSON export/import with automatic relationship resolution

Installation

pip install -i https://test.pypi.org/simple/ sloth-mmcif

Or from source:

git clone https://github.com/lucas-ebi/sloth.git
cd sloth
pip install -e ".[dev]"

Quick Start

from sloth import MMCIFHandler

handler = MMCIFHandler()
mmcif = handler.read("1abc.cif")

# Dot notation
print(mmcif.data_1ABC._struct.title[0])
print(mmcif.data_1ABC._atom_site.Cartn_x[0])

# Dictionary notation
x = mmcif.data[0]["_atom_site"]["Cartn_x"]

# Export to nested JSON
handler.export(mmcif, file_path="output.json", indent=2)

Validation

from sloth import MMCIFValidator

# Full validation (dictionary schema + wwPDB rules)
vp = MMCIFValidator()
report = vp.validate(mmcif)
print(report.is_valid)      # True / False
print(report.errors)        # ERROR-level issues
print(report.warnings)      # WARNING-level issues

Performance

Benchmarks on synthetic mmCIF files (macOS, Python 3.10):

File Size	Full Parse	Selective	Access Speed	Memory (Parse)	Memory (Access)
1KB	12ms	13ms	40μs	198KB	4KB
10KB	12ms	13ms	97μs	222KB	13KB
100KB	13ms	14ms	594μs	1.0MB	104KB
1.0MB	19ms	25ms	6ms	7.7MB	954KB
50.7MB	394ms	693ms	298ms	205.4MB	46.1MB
102.0MB	817ms	1.4s	607ms	386.8MB	75.5MB

Note: Access memory can appear smaller than the file on disk because Python's string interning deduplicates repeated values in mmCIF columns (e.g., atom type symbols, residue names, chain IDs). When many rows share the same string, Python stores it only once — so memory usage after access reflects unique string content rather than total row count.

Documentation

Full documentation, API reference, and interactive cookbook:

Read the Docs — User guide & API reference
Cookbook — Interactive Jupyter notebook tutorial

Contributing

Fork the repo
Create a feature branch
Add tests
Submit a PR

License

MIT License — use freely, modify responsibly.

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github/workflows		.github/workflows
docs		docs
sloth		sloth
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SLOTH – Structural Loader with On-demand Traversal Handling

Installation

Quick Start

Validation

Performance

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SLOTH – Structural Loader with On-demand Traversal Handling

Installation

Quick Start

Validation

Performance

Documentation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages