-
Notifications
You must be signed in to change notification settings - Fork 1
proteina for peptide modifies #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
JingXuu
commented
Oct 10, 2025
- set receptor as motif, scaffold peptide
- change features: res_seq_pdb_idx(peptide+200) and chain_break_per_res
- add metrics calculation codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements peptide design functionality for the proteina model by setting the receptor as a motif and designing scaffold peptides. The key modifications include chain-aware PDB writing, peptide-specific feature engineering, and comprehensive metrics calculation capabilities for evaluation.
Key changes:
- Enhanced PDB writing to support multi-chain structures with proper chain indexing
- Modified training pipeline to handle peptide-specific features including residue indexing with offsets and chain break markers
- Added comprehensive metrics calculation infrastructure including RMSD computation, structure diversity analysis, and designability assessment
Reviewed Changes
Copilot reviewed 29 out of 31 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| proteinfoundation/utils/ff_utils/pdb_utils.py | Enhanced PDB writing with chain support and motif-based structure organization |
| proteinfoundation/proteinflow/model_trainer_base.py | Added peptide features and validation RMSD tracking |
| proteinfoundation/motif_inference.py | Enhanced inference with PDB patching and RMSD calculation |
| test/cal_metrics/ | Complete metrics calculation suite including RMSD, diversity, and designability |
| configs/ | Configuration files for peptide training and inference |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| boundaries = torch.linspace( | ||
| 0.0, max_dist_boundary, num_dist_buckets - 1, device=pair_pred.device | ||
| 0.1, max_dist_boundary, num_dist_buckets - 1, device=pair_pred.device |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded value 0.1 for distance boundary start may cause issues with very short distances. Consider using 0.0 or making this configurable.
| n=n, shape=batch_shape, device=self.device, dtype=dtype, mask=mask | ||
| ) | ||
|
|
||
| batch['x_1'] = x_1 #Ensure that the unit used during training is nanometers (nm) not ang!!!!! |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammatical error: 'ang' should be 'Å' or 'angstroms'.
| batch['x_1'] = x_1 #Ensure that the unit used during training is nanometers (nm) not ang!!!!! | |
| batch['x_1'] = x_1 # Ensure that the unit used during training is nanometers (nm), not angstroms (Å)! |
| if random.random() > 0.5 and self.cfg_exp.training.self_cond: | ||
| x_pred_sc, _ = self.predict_clean(batch) | ||
| batch["x_sc"] = self.detach_gradients(x_pred_sc) | ||
| #import pdb; pdb.set_trace() |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug statement should be removed from production code.
| #import pdb; pdb.set_trace() |
| motif_min_length = cfg.motif_min_length, | ||
| motif_max_length = cfg.motif_max_length, | ||
| motif_task_name = cfg.motif_task_name) | ||
| #import pdb;pdb.set_trace() |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug statement should be removed from production code.
| #import pdb;pdb.set_trace() |
| Returns: | ||
| List of sequences (strings) | ||
| """ | ||
| # import pdb;pdb.set_trace() |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug statement should be removed from production code.
| # import pdb;pdb.set_trace() |
| # cal_metrics.py (only code & comments in English) | ||
|
|
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orphaned comment that doesn't relate to surrounding code should be removed or clarified.
| # cal_metrics.py (only code & comments in English) |
| chain_index=np.concatenate([prot_rec.chain_index, prot_pep.chain_index], axis=0), | ||
| b_factors=np.concatenate([prot_rec.b_factors, prot_pep.b_factors], axis=0), | ||
| ) | ||
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) # Now `add_end=True` for automatic ENDMDL and END |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment mentions 'Now add_end=True' but doesn't explain why this change was made or when it should be used.
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) # Now `add_end=True` for automatic ENDMDL and END | |
| # Use add_end=True to ensure ENDMDL and END records are included in the PDB output. | |
| # This is necessary when writing combined structures (e.g., motif and peptide as separate chains) | |
| # to properly terminate the PDB file for downstream compatibility. In other cases, set add_end=False | |
| # if termination records are not required. | |
| pdb_prot = to_pdb(full_prot, model=1, add_end=True) |
| pep_seqs = pep_seqss[1:] | ||
| print(pep_seqs) |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement should be removed or replaced with proper logging.
| gt_base = os.path.basename(gt_pdb_path) | ||
| m_case = re.search(r'([0-9][A-Za-z0-9]{3})', gt_base) | ||
| case_id = m_case.group(1) if m_case else os.path.splitext(gt_base)[0] | ||
| #import pdb;pdb.set_trace() |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another commented-out debug statement should be removed from production code.
| #import pdb;pdb.set_trace() |