Lazy by design. Fast by default.
SLOTH is a fast, flexible mmCIF parser for structural biology workflows. Built on the C++ gemmi backend, it performs eager parsing and lazy object construction — efficient for both large-scale pipelines and interactive exploration.
- High-speed parsing via gemmi
- Lazy construction of row and item objects for memory efficiency
- Pythonic dot-notation access to mmCIF data
- Multi-level validation —
MMCIFValidator().validate()runs the full mmCIF dictionary + wwPDB rule suite and returns aValidationReport - Schema-aware warnings — unknown categories/items trigger
SchemaWarningwith "Did you mean …?" suggestions - Tab completion & fuzzy matching —
__dir__()exposes item/category/block names; typos produce helpfulAttributeErrormessages - Pluggable validation with cross-category support and model-level plugin registration
- JSON export/import with automatic relationship resolution
pip install -i https://test.pypi.org/simple/ sloth-mmcifOr from source:
git clone https://github.com/lucas-ebi/sloth.git
cd sloth
pip install -e ".[dev]"from sloth import MMCIFHandler
handler = MMCIFHandler()
mmcif = handler.read("1abc.cif")
# Dot notation
print(mmcif.data_1ABC._struct.title[0])
print(mmcif.data_1ABC._atom_site.Cartn_x[0])
# Dictionary notation
x = mmcif.data[0]["_atom_site"]["Cartn_x"]
# Export to nested JSON
handler.export(mmcif, file_path="output.json", indent=2)from sloth import MMCIFValidator
# Full validation (dictionary schema + wwPDB rules)
vp = MMCIFValidator()
report = vp.validate(mmcif)
print(report.is_valid) # True / False
print(report.errors) # ERROR-level issues
print(report.warnings) # WARNING-level issuesBenchmarks on synthetic mmCIF files (macOS, Python 3.10):
| File Size | Full Parse | Selective | Access Speed | Memory (Parse) | Memory (Access) |
|---|---|---|---|---|---|
| 1KB | 12ms | 13ms | 40μs | 198KB | 4KB |
| 10KB | 12ms | 13ms | 97μs | 222KB | 13KB |
| 100KB | 13ms | 14ms | 594μs | 1.0MB | 104KB |
| 1.0MB | 19ms | 25ms | 6ms | 7.7MB | 954KB |
| 50.7MB | 394ms | 693ms | 298ms | 205.4MB | 46.1MB |
| 102.0MB | 817ms | 1.4s | 607ms | 386.8MB | 75.5MB |
Note: Access memory can appear smaller than the file on disk because Python's string interning deduplicates repeated values in mmCIF columns (e.g., atom type symbols, residue names, chain IDs). When many rows share the same string, Python stores it only once — so memory usage after access reflects unique string content rather than total row count.
Full documentation, API reference, and interactive cookbook:
- Read the Docs — User guide & API reference
- Cookbook — Interactive Jupyter notebook tutorial
- Fork the repo
- Create a feature branch
- Add tests
- Submit a PR
MIT License — use freely, modify responsibly.
