Remote MoE sharding fails before expert batch POST: decode_token_with_moe returned None during prefill

I reproduced the Gemma-4-26B-A4B MoE sharding setup.

Built full vindex from google/gemma-4-26B-A4B-it:

larql extract /mnt/2T/models/hf/gemma-4-26B-A4B-it \
  -o output/gemma4-26b-a4b-q4k-fmd.vindex \
  --quant q4k \
  --feature-major-down

Local inference works:

larql run gemma4-26b-a4b-fmd --max-tokens 8 "The capital of France is"
=> Paris

Two expert servers start correctly:

larql serve gemma4-26b-a4b-fmd --port 8081 --experts 0-63
larql serve gemma4-26b-a4b-fmd --port 8082 --experts 64-127

Both servers show:

Down features Q4K: loaded
Endpoints: POST /v1/expert/batch
GET /v1/health succeeds

But remote MoE inference fails:

larql run gemma4-26b-a4b-fmd \
  --moe-shards "0-63=http://127.0.0.1:8081,64-127=http://127.0.0.1:8082" \
  --max-tokens 1 \
  "The capital of France is"

Error:

decode_token_with_moe returned None during prefill

With --moe-dispatch batch:

decode returned None during prefill

Server logs show only GET /v1/health.
No POST /v1/expert/batch is emitted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote MoE sharding fails before expert batch POST: decode_token_with_moe returned None during prefill #146

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Remote MoE sharding fails before expert batch POST: decode_token_with_moe returned None during prefill #146

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions