1Cat-vLLM 1.1.0在4×V100 SXM2 16GB上，TP=4跑Qwen3-30B-A3B-AWQ，单请求完全正常，但多个请求并发时报错

### 🚀 The feature, motivation and pitch

GitHub Issue版本（更详细）：

Environment:

GPU: Tesla V100 SXM2 16GB × 4
1Cat-vLLM: 1.1.0
CUDA: 12.0
Python: 3.12
Model: Qwen3-30B-A3B-AWQ

Problem:
Single request works perfectly. But when 5 concurrent requests are sent simultaneously, the service crashes with:
RuntimeError: Worker failed with error 'Shared memory exceeds 96KB: 99840 bytes'
Launch command:
--tensor-parallel-size 4 --enforce-eager --dtype float16 --max-num-seqs 5
Question: Is there a fix or workaround for this?

### Alternatives

你好，请问1Cat-vLLM 1.1.0在4×V100 SXM2 16GB上，TP=4跑Qwen3-30B-A3B-AWQ，单请求完全正常，但多个请求并发时报错：RuntimeError: Worker failed with error 'Shared memory exceeds 96KB: 99840 bytes'，服务直接崩溃重启。请问这个问题有解决方案吗？

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1Cat-vLLM 1.1.0在4×V100 SXM2 16GB上，TP=4跑Qwen3-30B-A3B-AWQ，单请求完全正常，但多个请求并发时报错 #60

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

1Cat-vLLM 1.1.0在4×V100 SXM2 16GB上，TP=4跑Qwen3-30B-A3B-AWQ，单请求完全正常，但多个请求并发时报错 #60

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions