problem training

I have been having issues training appt. I provided input with my sequences in the required format: protein1_sequence, protein2_sequence, pkd. However, after training, no prediction works and it shows the following error. I took the sequences from the ./data/Data.csv directory, using only a subset of 300 sequences in the indicated format, but the result is the same error. Could you please help me? I am running the training inside the Docker container with access to an L4 GPU with 24 GB of VRAM and 8 CPUs with 32 GB of RAM.


python cli.py --sequences ABC DEF
2025-09-27 20:33:41,177 - INFO - Using device: cpu
2025-09-27 20:33:41,177 - INFO - Processing raw sequence pairs
2025-09-27 20:33:41,177 - INFO - Successfully loaded 1 protein pairs
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py:306: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True
  warnings.warn(f"enable_nested_tensor is True, but self.use_nested_tensor is False because {why_not_sparsity_fast_path}")
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.01s/it]
2025-09-27 20:33:43,492 - INFO - Found existing cache file (123.16 MB)
2025-09-27 20:33:44,046 - INFO - Successfully loaded cache from embedding_cache_2560/caches.pt
2025-09-27 20:33:44,046 - INFO - Cache contains 12000 protein embeddings
2025-09-27 20:33:44,046 - INFO - Sample protein length: 11
2025-09-27 20:33:44,046 - INFO - Sample embedding shape: torch.Size([1, 2560])
2025-09-27 20:33:44,046 - INFO - Cache initialized at embedding_cache_2560
2025-09-27 20:33:44,046 - INFO - Initial cache size: 12000 embeddings
2025-09-27 20:33:44,046 - INFO - Sample embedding shape: torch.Size([1, 2560])
Traceback (most recent call last):
  File "/software/appt/cli.py", line 272, in <module>
    main()
  File "/software/appt/cli.py", line 230, in main
    trainer.model.load_state_dict(checkpoint['model_state_dict'])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2189, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ProteinProteinAffinityLM:
	Unexpected key(s) in state_dict: "transformer.layers.4.self_attn.in_proj_weight", "transformer.layers.4.self_attn.in_proj_bias", "transformer.layers.4.self_attn.out_proj.weight", "transformer.layers.4.self_attn.out_proj.bias", "transformer.layers.4.linear1.weight", "transformer.layers.4.linear1.bias", "transformer.layers.4.linear2.weight", "transformer.layers.4.linear2.bias", "transformer.layers.4.norm1.weight", "transformer.layers.4.norm1.bias", "transformer.layers.4.norm2.weight", "transformer.layers.4.norm2.bias". 
	size mismatch for protein_projection.weight: copying a param with shape torch.Size([448, 2560]) from checkpoint, the shape in current model is torch.Size([384, 2560]).
	size mismatch for protein_projection.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
	size mismatch for transformer.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
	size mismatch for transformer.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
	size mismatch for transformer.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for transformer.layers.0.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for transformer.layers.0.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for transformer.layers.0.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.0.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
	size mismatch for transformer.layers.1.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
	size mismatch for transformer.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
	size mismatch for transformer.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for transformer.layers.1.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for transformer.layers.1.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for transformer.layers.1.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.1.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
	size mismatch for transformer.layers.2.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
	size mismatch for transformer.layers.2.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
	size mismatch for transformer.layers.2.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for transformer.layers.2.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for transformer.layers.2.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for transformer.layers.2.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.2.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.self_attn.in_proj_weight: copying a param with shape torch.Size([1344, 448]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
	size mismatch for transformer.layers.3.self_attn.in_proj_bias: copying a param with shape torch.Size([1344]) from checkpoint, the shape in current model is torch.Size([1152]).
	size mismatch for transformer.layers.3.self_attn.out_proj.weight: copying a param with shape torch.Size([448, 448]) from checkpoint, the shape in current model is torch.Size([384, 384]).
	size mismatch for transformer.layers.3.self_attn.out_proj.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.linear1.weight: copying a param with shape torch.Size([1792, 448]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
	size mismatch for transformer.layers.3.linear1.bias: copying a param with shape torch.Size([1792]) from checkpoint, the shape in current model is torch.Size([1536]).
	size mismatch for transformer.layers.3.linear2.weight: copying a param with shape torch.Size([448, 1792]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
	size mismatch for transformer.layers.3.linear2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.norm1.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.norm1.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.norm2.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for transformer.layers.3.norm2.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for affinity_head.0.weight: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for affinity_head.0.bias: copying a param with shape torch.Size([448]) from checkpoint, the shape in current model is torch.Size([384]).
	size mismatch for affinity_head.1.weight: copying a param with shape torch.Size([256, 448]) from checkpoint, the shape in current model is torch.Size([160, 384]).
	size mismatch for affinity_head.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([160]).
	size mismatch for affinity_head.4.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([80, 160]).
	size mismatch for affinity_head.4.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([80]).
	size mismatch for affinity_head.7.weight: copying a param with shape torch.Size([1, 128]) from checkpoint, the shape in current model is torch.Size([1, 80]).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

problem training #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

problem training #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions