Skip to content

OpenPI VLA train_step always fails server-side with "Operation failed" (SDK + HTTP demos) #1

@BobYounger

Description

@BobYounger

hello, i've come up with a bug: OpenPI VLA train_step always returns server-side "Operation failed" error. I'm wondering if the vla server pipeline is operating normally.

Endpoint: https://mint-cn.macaron.xin/
Base model: openpi/pi0-fast-libero-low-mem-finetune (mintx.OPENPI_FAST_MODEL)
Reproduces 100%. Both SDK and HTTP demos hit the exact same error.

Failed request_ids (forward_backward / train_step step):
- 5db5da02571649c3adeea9340900f23f (SDK demo, ~10:50 北京时间 2026-05-08)
- f21f7c3c684547bea5eeccdca3e58c4e (SDK demo, ~11:30 北京时间 2026-05-08)

Server response payload contains:
{"error": "Operation failed. Contact administrator if issue persists."}

Control: demos/rl/adapters/verifiable_math.py runs end-to-end on the same
account + endpoint (forward_backward + optim_step + save_weights all
succeed). So the failure is specific to the OpenPI VLA pipeline server-side,
not to this account, region, or SDK wrapping.

Earlier successful steps (both demos): create_session, create_model,
get_info. Failure happens at the actual forward/backward GPU execution.

Repo: mint-quickstart @ HEAD
mindlab-toolkit @ commit f0d3b21fe34a9419fe2c840036b44618e211a596

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions