Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,7 @@ files_mix: # PDB files for each crosslink type (required if mix_bool=true)

replace_bool: false # Enable crosslink replacement
ratio_replace: 30 # Percentage of crosslinks to replace (0-100)
ratio_replace_scope: "enzymatic" # Crosslinks to replace: "enzymatic" (default), "non_enzymatic" (AGEs), or "all"
replace_file: null # Input file, or null to use geometry output

# ================================================================================
Expand Down Expand Up @@ -620,10 +621,13 @@ n_term_combination: "9.C - 947.A"
c_term_combination: "1047.C - 104.C"
replace_bool: true
ratio_replace: 30 # Replace 30% of crosslinks
ratio_replace_scope: "enzymatic" # "enzymatic" (default), "non_enzymatic" (AGEs), or "all"
fibril_length: 40.0
contact_distance: 20
```

> **Note:** When you supply an input `pdb_file` together with `n_term_type`/`c_term_type`, ColBuilder verifies that the crosslinks in the PDB match the specified types. Mismatches (e.g. a trivalent PDB with `n_term_type: "HLKNL"`) stop generation with error `GEO_ERR_008`.

```bash
colbuilder --config_file config_replace_crosslinks.yaml
```
Expand Down
235 changes: 149 additions & 86 deletions config.yaml

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ files_mix:
# Replacement Options
replace_bool: false
ratio_replace: 30
ratio_replace_scope: "enzymatic" # enzymatic | non_enzymatic | all
replace_file: null

# Topology Options
Expand Down Expand Up @@ -182,6 +183,10 @@ These parameters control the sequence generation stage (homology modeling).
- A complete list of available species and combinations can be found at [src/colbuilder/data/sequence/crosslinks.csv](https://github.com/graeter-group/colbuilder/blob/main/src/colbuilder/data/sequence/crosslinks.csv).
- When using the mutated PDB workflow, run sequence generation separately first, then use the output in geometry generation.

**Crosslink type validation**:
- When you provide an input `pdb_file` together with `n_term_type`/`c_term_type`, ColBuilder checks that the crosslink residues present in the PDB are consistent with the specified types. Divalent types (HLKNL, LKNL, deHLNL, deHHLNL) expect a divalent structure, and trivalent types (PYD, DPD, PYL, DPL) expect a trivalent structure.
- If the PDB contains crosslinks of a category that was not requested (for example, a trivalent PDB while `n_term_type: "HLKNL"` is set), generation stops with error `GEO_ERR_008` and a message listing the detected crosslink residues. Update `n_term_type`/`c_term_type` to match the structure, or supply a PDB whose crosslinks match the specified types.

## Geometry Generation Parameters

These parameters control the generation of the microfibril structure.
Expand Down Expand Up @@ -213,6 +218,7 @@ These parameters control advanced features for creating mixed crosslinked microf
| `files_mix` | list of strings | | Required if mix_bool is true, paths to PDB files with different crosslink types |
| `replace_bool` | boolean | false | Enable crosslink replacement (with lysines) |
| `ratio_replace` | integer | 30 | Percentage of crosslinks to replace (0-100) |
| `ratio_replace_scope` | string | "enzymatic" | Which crosslinks are eligible for ratio-based replacement: `enzymatic`, `non_enzymatic`, or `all` |
| `replace_file` | string, null | null | File with crosslinks to be replaced |

**Notes**:
Expand All @@ -221,6 +227,7 @@ These parameters control advanced features for creating mixed crosslinked microf
- The `files_mix` parameter specifies paths to PDB files of collagen molecules, each with a different crosslink type.
- The `replace_bool` feature simulates partial crosslinking or aged collagen by replacing some crosslinks with unmodified lysine residues.
- The `ratio_replace` parameter controls what percentage of crosslinks should be replaced.
- The `ratio_replace_scope` parameter selects which crosslinks may be replaced. **The default is `enzymatic`**, so ratio-based replacement targets enzymatic crosslinks (e.g. HLKNL/PYD-derived residues) unless you choose otherwise. Use `non_enzymatic` to target advanced glycation end-product (AGE) crosslinks (Glucosepane, Pentosidine, MOLD), or `all` to consider both.
- The `replace_file` parameter specifies the path to a PDB file of a previously generated collagen microfibril. Set to null to use the geometry generation output.

## Topology Generation Parameters
Expand Down Expand Up @@ -399,6 +406,7 @@ Understanding how parameters interact is important for successful use of ColBuil
- Ratios in `ratio_mix` must sum to 100.
- If `replace_bool` is true, you must specify a valid percentage in `ratio_replace` (0-100).
- If `replace_bool` is true and `geometry_generator` is false, you must provide a `replace_file`.
- `ratio_replace_scope` must be one of `enzymatic` (default), `non_enzymatic`, or `all`.

5. **Geometry Generation Dependencies**:
- If `crystalcontacts_optimize` is true, the geometry generation will take longer but may produce better-packed microfibrils.
Expand Down
3 changes: 3 additions & 0 deletions docs/data_dictionary.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ The most commonly used parameters for ColBuilder configuration:
| files_mix | List of Paths | PDB files with different crosslink types | Valid PDB file paths (≥2 files) | [] |
| replace_bool | boolean | Replace crosslinks with lysines | true/false | false |
| ratio_replace | float | Percentage of crosslinks to replace | 0-100 | None |
| ratio_replace_scope | string | Which crosslinks are eligible for ratio-based replacement | "enzymatic", "non_enzymatic", "all" | "enzymatic" |
| replace_file | Path/null | Input PDB file of fibril with crosslinks | Valid file path or null | null |

**Validation Rules**:
Expand All @@ -186,10 +187,12 @@ The most commonly used parameters for ColBuilder configuration:
- `ratio_replace` must be between 0 and 100
- Either `geometry_generator=true` OR `replace_file` must be provided
- If `geometry_generator=false`, must provide `replace_file`
- `ratio_replace_scope` must be one of `enzymatic`, `non_enzymatic`, or `all`

**Notes**:
- **Mixing** creates heterogeneous microfibrils with different crosslink types (e.g., 80% divalent + 20% trivalent)
- **Replacement** simulates partial crosslinking or aged collagen by replacing some crosslinks with unmodified lysine residues
- `ratio_replace_scope` selects which crosslinks may be replaced. The default `enzymatic` targets enzymatic crosslinks (HLKNL/PYD-derived residues); `non_enzymatic` targets AGE crosslinks (Glucosepane, Pentosidine, MOLD); `all` considers both
- Set `replace_file: null` to use geometry generation output for replacement

### Topology Generation Parameters
Expand Down
5 changes: 5 additions & 0 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,14 @@ files_mix: # Required if mix_bool is true
```yaml
replace_bool: false # Enable replacement of crosslinks with lysines
ratio_replace: 30 # Percentage of crosslinks to replace
ratio_replace_scope: "enzymatic" # Crosslinks to replace: "enzymatic" (default), "non_enzymatic" (AGEs), or "all"
replace_file: null # File with crosslinks to be replaced
```

The `ratio_replace_scope` parameter controls which crosslinks are eligible for ratio-based
replacement. The default `enzymatic` targets enzymatic crosslinks (HLKNL/PYD-derived residues);
use `non_enzymatic` to target AGE crosslinks (Glucosepane, Pentosidine, MOLD), or `all` for both.

### Topology Options

```yaml
Expand Down
12 changes: 5 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,19 @@ classifiers = [
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"numpy==1.25",
"vermouth==0.9.1",
# <2 keeps the numpy 1.x ABI (vermouth 0.9.1); >=1.25 lets pip pick a wheel
# that exists for the running Python (e.g. 1.26 on 3.12, where 1.25 has none).
"numpy>=1.25,<2",
"vermouth==0.9.1",
"pandas==2.2.2",
Comment thread
johibuck marked this conversation as resolved.
"biopython==1.84",
"scikit-learn",
"h5py",
"libnetcdf",
"threadpoolctl",
"PyYAML",
"click",
"pydantic>=2.0",
"tqdm",
"asyncio",
"colorama>=0.4.4",
"click>=8.0.0",
]
Expand Down
15 changes: 15 additions & 0 deletions src/colbuilder/colbuilder.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,21 @@ async def run_pipeline(config: ColbuilderConfig) -> Dict[str, Path]:
f"Using generated sequence PDB for further processing: {sequence_pdb}"
)

# Validate that the crosslinks present in an input PDB are consistent
# with the crosslink types requested in the configuration. This catches
# the common mistake of, e.g., specifying a divalent type (HLKNL) for a
# trivalent (PYD) structure. Generated PDBs match by construction, so
# this only meaningfully guards user-provided structures.
if config.pdb_file and (config.n_term_type or config.c_term_type):
Comment thread
johibuck marked this conversation as resolved.
from colbuilder.core.utils.crosslink_detector import CrosslinkDetector

pdb_to_check = Path(config.pdb_file).resolve()
if pdb_to_check.exists():
CrosslinkDetector.validate_against_specified_types(
pdb_to_check,
[config.n_term_type, config.c_term_type],
)

# Topology-only mode
if (config.topology_generator and
not config.geometry_generator and
Expand Down
15 changes: 9 additions & 6 deletions src/colbuilder/core/geometry/chimera.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,15 @@ def swapaa(self, replace: str, system_type: str) -> subprocess.CompletedProcess:
LOG.debug(
f"Chimera command completed with return code: {result.returncode}"
)
if result.stdout:
# LOG.debug(f"Chimera stdout: {result.stdout}")
pass
if result.stderr:
# LOG.debug(f"Chimera stderr: {result.stderr}")
pass
if result.returncode != 0:
# Surface failures: a silent swapaa failure leaves unpaired crosslink
# markers (non-standard residues) in the structure, which breaks
# downstream topology generation / simulation.
LOG.error(
f"Chimera swapaa failed (return code {result.returncode}); "
f"unpaired crosslink markers may remain unmutated. "
f"stderr: {result.stderr}"
)
return result
except Exception as e:
LOG.error(f"Error executing Chimera command: {str(e)}")
Expand Down
16 changes: 10 additions & 6 deletions src/colbuilder/core/geometry/connect.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,12 +142,16 @@ def get_external_connect_file(
Any: Updated system object.
"""
if connect_file:
self.external_connect = [
float(l.split(" ")[0].replace(".caps.pdb", ""))
for l in open(connect_file.with_suffix(".txt"), "r")
]
if np.min(self.external_connect) > 0:
self.external_connect = [i - 1 for i in self.external_connect]
with open(connect_file.with_suffix(".txt"), "r") as fh:
self.external_connect = [
float(l.split(" ")[0].replace(".caps.pdb", ""))
for l in fh
]
# Connect files are written by write_connect() using the system's own
# (0-based) model ids, so they are already aligned with system.get_connect().
# The previous "if min > 0: subtract 1" heuristic wrongly shifted every id
# whenever model 0 happened to be absent from the file, mis-mapping
# connectivity. No shift is applied.

for model_id in system.get_connect().keys():
if model_id not in self.external_connect:
Expand Down
16 changes: 8 additions & 8 deletions src/colbuilder/core/geometry/crosslink.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@ def read_crosslink(pdb_file: Union[str, Path]) -> List[Crosslink]:
resname=line[17:20],
chain=line[21],
position=[
float(line[29:38]),
float(line[30:38]),
float(line[38:46]),
float(line[46:56]),
float(line[46:54]),
],
type="T",
)
Expand All @@ -113,9 +113,9 @@ def read_crosslink(pdb_file: Union[str, Path]) -> List[Crosslink]:
resname=line[17:20],
chain=line[21],
position=[
float(line[29:38]),
float(line[30:38]),
float(line[38:46]),
float(line[46:56]),
float(line[46:54]),
],
type="D",
)
Expand All @@ -129,9 +129,9 @@ def read_crosslink(pdb_file: Union[str, Path]) -> List[Crosslink]:
resname=line[17:20],
chain=line[21],
position=[
float(line[29:38]),
float(line[30:38]),
float(line[38:46]),
float(line[46:56]),
float(line[46:54]),
],
type="D",
)
Expand All @@ -145,9 +145,9 @@ def read_crosslink(pdb_file: Union[str, Path]) -> List[Crosslink]:
resname=line[17:20],
chain=line[21],
position=[
float(line[29:38]),
float(line[30:38]),
float(line[38:46]),
float(line[46:56]),
float(line[46:54]),
],
type="D",
)
Expand Down
Loading