Skip to content

problem training #10

@jtuxman

Description

@jtuxman

I have been having issues training appt. I provided input with my sequences in the required format: protein1_sequence, protein2_sequence, pkd. However, after training, no prediction works and it shows the following error. I took the sequences from the ./data/Data.csv directory, using only a subset of 300 sequences in the indicated format, but the result is the same error. Could you please help me? I am running the training inside the Docker container with access to an L4 GPU with 24 GB of VRAM and 8 CPUs with 32 GB of RAM.

python cli.py --sequences ABC DEF
2025-09-27 20:33:41,177 - INFO - Using device: cpu
2025-09-27 20:33:41,177 - INFO - Processing raw sequence pairs
2025-09-27 20:33:41,177 - INFO - Successfully loaded 1 protein pairs
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py:306: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.01s/it]
2025-09-27 20:33:43,492 - INFO - Found existing cache file (123.16 MB)
2025-09-27 20:33:44,046 - INFO - Successfully loaded cache from embedding_cache_2560/caches.pt
2025-09-27 20:33:44,046 - INFO - Cache contains 12000 protein embeddings
2025-09-27 20:33:44,046 - INFO - Sample protein length: 11
2025-09-27 20:33:44,046 - INFO - Sample embedding shape: torch.Size([1, 2560])
2025-09-27 20:33:44,046 - INFO - Cache initialized at embedding_cache_2560
2025-09-27 20:33:44,046 - INFO - Initial cache size: 12000 embeddings
2025-09-27 20:33:44,046 - INFO - Sample embedding shape: torch.Size([1, 2560])
Traceback (most recent call last):
File "/software/appt/cli.py", line 272, in
main()
File "/software/appt/cli.py", line 230, in main
trainer.model.load_state_dict(checkpoint['model_state_dict'])
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ProteinProteinAffinityLM:
Unexpected key(s) in state_dict: "transformer.layers.4.self_attn.in_proj_weight", "transformer.layers.4.self_attn.in_proj_bias", "transformer.layers.4.self_attn.out_proj.weight", "transformer.layers.4.self_attn.out_proj.bias", "transformer.layers.4.linear1.weight", "transformer.layers.4.linear1.bias", "transformer.layers.4.linear2.weight", "transformer.layers.4.linear2.bias", "transformer.layers.4.norm1.weight", "transformer.layers.4.norm1.bias", "transformer.layers.4.norm2.weight", "transformer.layers.4.norm2.bias".
size mismatch for protein_projection.weight: copying a param with shape torch.Size([448, 2560]) from checkpoint, the shape in current model is torch.Size([384, 2560]).
size mismatch for protein_projection.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for transformer.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for transformer.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for transformer.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for transformer.layers.0.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for transformer.layers.0.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for transformer.layers.0.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.0.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for transformer.layers.1.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for transformer.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for transformer.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for transformer.layers.1.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for transformer.layers.1.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for transformer.layers.1.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.1.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for transformer.layers.2.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for transformer.layers.2.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for transformer.layers.2.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for transformer.layers.2.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for transformer.layers.2.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for transformer.layers.2.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.2.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
size mismatch for transformer.layers.3.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
size mismatch for transformer.layers.3.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
size mismatch for transformer.layers.3.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
size mismatch for transformer.layers.3.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
size mismatch for transformer.layers.3.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
size mismatch for transformer.layers.3.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for transformer.layers.3.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for affinity_head.0.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for affinity_head.0.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
size mismatch for affinity_head.1.weight: copying a param with shape torch.Size([256, 448]) from checkpoint, the shape in current model is torch.Size([160, 384]).
size mismatch for affinity_head.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([160]).
size mismatch for affinity_head.4.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([80, 160]).
size mismatch for affinity_head.4.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for affinity_head.7.weight: copying a param with shape torch.Size([1, 128]) from checkpoint, the shape in current model is torch.Size([1, 80]).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions