(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) File "/opt/Megatron-Bridge/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 609, in _forward_attention [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) attention_output_with_bias = self.self_attention( [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) ^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) self.config.cache_mla_latents [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) AssertionError: currently to use dynamic backend for MLA cache mla latents must be true [repeated 3x across cluster]
Add any other context about the problem here.
Describe the bug
Steps/Code to reproduce bug
Expected behavior
Can deploy a ray cluster
Additional context
Add any other context about the problem here.