Ray Deployment of moonlight 16B (MBridge) fails

**Describe the bug**

```
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)   File "/opt/Megatron-Bridge/3rdparty/Megatron-LM/megatron/core/transformer/transformer_layer.py", line 609, in _forward_attention [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     attention_output_with_bias = self.self_attention( [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)                                  ^^^^^^^^^^^^^^^^^^^^ [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96)     self.config.cache_mla_latents [repeated 3x across cluster]
(ServeReplica:megatron_model:MegatronRayDeployable pid=836, ip=100.65.137.96) AssertionError: currently to use dynamic backend for MLA cache mla latents must be true [repeated 3x across cluster]
```

**Steps/Code to reproduce bug**

1. ToT MBridge/MCore
2. Moonlight 16B pretrain checkpoint
3. TP1/PP1/CP1

**Expected behavior**

Can deploy a ray cluster

**Additional context**

Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray Deployment of moonlight 16B (MBridge) fails #616

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ray Deployment of moonlight 16B (MBridge) fails #616

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions