Skip to content

Server returns empty answer when too many tokens requested #652

@marta-sd

Description

@marta-sd

Describe the bug

When the "max_tokens" in the payload is higher than --inference_max_seq_length passed to the server (in my case: 8192 vs 4096) the server responds with empty assistant message.

Steps/Code to reproduce bug

Deployment snippet (Eos cluster):

python \
  /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
  --megatron_checkpoint /lustre/fsw/coreai_dlalgo_ci/nemo_export_deploy_eval_checkpoints/mbridge/meta-llama/Llama-3.1-8B-Instruct/iter_0000000/ \
  --model_id megatron_model \
  --port 8886 \
  --host 0.0.0.0 \
  --num_gpus 8 \
  --tensor_model_parallel_size 1 \
  --pipeline_model_parallel_size 1 \
  --expert_model_parallel_size 1 \
  --max_batch_size 2 \
  --num_replicas 8 \
  --inference_max_seq_length 4096 \
  --runtime_env '{"py_executable": "/opt/venv/bin/python"}' &

Then to send a request to the model:

import requests
model_name="megatron_model"
endpoint_url="http://0.0.0.0:8886/v1/chat/completions"

payload = {"model": model_name, "max_tokens": 8192, "top_p": 0.9999999, "temperature": 1e-07, "messages": [{"role": "user", "content": "## Instruction:\n\nPlease answer this question by first reasoning and then selecting the correct choice.\nPresent your reasoning and solution in the following json format.\nPlease show your choice in the `answer` field with only the choice letter, e.g.,`\"answer\": \"C\"`.\n\n```json\n{\n    \"reasoning\": \"___\",\n    \"answer\": \"___\"\n}\n```\n\n## Question:\n\nWhich of the following is a disorder characterized by uncontrollable episodes of falling asleep during the day?\n\n## Choices:\n\n- (A) Dyslexia\n- (B) Epilepsy\n- (C) Hydrocephalus\n- (D) Narcolepsy\n\n## Answer:"}]}

response = requests.post(endpoint_url, json=payload)
response.json()

Expected behavior

The server should respond with descriptive error

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions