Can't disable thinking/reasoning mode + JSON format issues with OpenAI-compatible API

I'm using the `ghcr.io/z-lab/paroquant:serve` container with `z-lab/Qwen3.5-9B-PARO`. The server starts fine and exposes an OpenAI-compatible endpoint at `/v1/chat/completions`.

## Problem 1: Can't disable thinking mode

According to the [Qwen3.5 official docs](https://huggingface.co/Qwen/Qwen3.5-9B), thinking can be disabled via request body:

```json
{
  "extra_body": {
    "chat_template_kwargs": {"enable_thinking": false}
  }
}
```

However, these fields are silently ignored. Docker logs show:

```
The following fields were present in the request but ignored: {'repeat_penalty', 'extra_body'}
```

It appears `--reasoning-parser qwen3` is hardcoded in the container startup, which forces thinking mode server-side and prevents per-request disabling.

## Problem 2: JSON path extraction fails when thinking is enabled

When thinking is forced on, the model generates long `<|im_start|>think...<|im_end|>` blocks before the actual response. Standard JSON path extractors (like the ones used by Calibre plugins) expect clean text and get stuck parsing thinking content.

## Current docker run command

```bash
docker run --pull=always --rm -it --gpus all --ipc=host -p 8888:8000 \
  -v C:\Users\User\Documents\Clonitaditos\Qwen3.5-9B-PARO\.cache\paroquant:/root/.cache/paroquant \
  ghcr.io/z-lab/paroquant:serve \
  --model z-lab/Qwen3.5-9B-PARO \
  --gpu-memory-utilization 0.9 \
  --max-num-seqs 1
```

## Questions

1. Is there a way to disable thinking mode from the docker run command? A flag like `--enable-thinking false` or `--reasoning-parser none`?
2. Is there a planned option to make `extra_body.chat_template_kwargs.enable_thinking` respected at runtime?
3. Any workaround for now (e.g., post-processing the response to strip `<|im_start|>think...<|im_end|>` blocks)?

## Request format we tried

```json
{
  "model": "z-lab/Qwen3.5-9B-PARO",
  "messages": [
    {"role": "system", "content": "Translate..."},
    {"role": "user", "content": "Hello world"}
  ],
  "temperature": 0.1,
  "top_p": 0.1,
  "top_k": 50,
  "repeat_penalty": 1.05,
  "min_p": 0.0,
  "extra_body": {
    "chat_template_kwargs": {"enable_thinking": false}
  }
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't disable thinking/reasoning mode + JSON format issues with OpenAI-compatible API #24

Problem 1: Can't disable thinking mode

Problem 2: JSON path extraction fails when thinking is enabled

Current docker run command

Questions

Request format we tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't disable thinking/reasoning mode + JSON format issues with OpenAI-compatible API #24

Description

Problem 1: Can't disable thinking mode

Problem 2: JSON path extraction fails when thinking is enabled

Current docker run command

Questions

Request format we tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions