[Bug] 在使用cpm.cu的cli进行推理的时候，为了达到最佳的效率，使用eagle3+yarn。但是，发现无法正常使用。错误信息：TypeError: '<=' not supported between instances of 'NoneType' and 'float'

### Checklist

- [x] I have searched related issues.
- [x] The bug has not been fixed in the latest version.
- [x] Bug-related environment info are given.

### Describe the bug

在使用cpm.cu的cli进行推理的时候，为了达到最佳的效率，使用eagle3+yarn。但是，发现无法正常使用。TypeError: '<=' not supported between instances of 'NoneType' and 'float'。
具体日志输出如下：
(py312) root@c232e04a1e93:/workspace/cpm.cu# python -m cpmcu.cli \
    --model-path /workspace/model  \
    --draft-model-path /workspace/eagle3_model  \
    --prompt-text "介绍一下清华大学" \
    --temperature 0.6 \
    --use-stream true \
    --num-generate 32768 \
    --random-seed 0 \
    --block-window-size 2048 \
    --sparse-topk-k 64 \
    --use-eagle3 true 
╭──────────────────────────────────────────────────────────────────────────────────────────── CLI Configuration ─────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                            │
│  ╭───────────────────────────────────────────────────────────────────────────────────────── System Information ─────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  OS:                       Linux 5.15.0-113-generic                                                                                                                                                  │  │
│  │  Python:                                    3.12.11                                                                                                                                                  │  │
│  │  GPU:                NVIDIA GeForce RTX 4090 (23GB)                                                                                                                                                  │  │
│  │  CUDA:                                         12.6                                                                                                                                                  │  │
│  │  PyTorch:                                     2.7.1                                                                                                                                                  │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Model Configuration ─────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Model Path:                /workspace/model                                                                                                                                                         │  │
│  │  Model Type:                            auto                                                                                                                                                         │  │
│  │  Data Type:                          float16                                                                                                                                                         │  │
│  │  Draft Model:        /workspace/eagle3_model                                                                                                                                                         │  │
│  │  FRSpec Path:                           None                                                                                                                                                         │  │
│  │  MiniCPM4 YARN:                            ✗                                                                                                                                                         │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── System Configuration ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  CUDA Graph:            ✓                                                                                                                                                                            │  │
│  │  Memory Limit:       0.90                                                                                                                                                                            │  │
│  │  Chunk Length:       2048                                                                                                                                                                            │  │
│  │  Plain Output:          ✗                                                                                                                                                                            │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Speculative Decoding ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Window Size:         1024                                                                                                                                                                           │  │
│  │  Iterations:             2                                                                                                                                                                           │  │
│  │  Top-K per Iter:        10                                                                                                                                                                           │  │
│  │  Tree Size:             12                                                                                                                                                                           │  │
│  │  FRSpec Vocab Size:  32768                                                                                                                                                                           │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Prompt Configuration ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Prompt File:        ✗                                                                                                                                                                               │  │
│  │  Prompt Text:        ✓                                                                                                                                                                               │  │
│  │  Use Chat Template:  ✓                                                                                                                                                                               │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭────────────────────────────────────────────────────────────────────────────────────── Generation Configuration ──────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Max Tokens:         32768                                                                                                                                                                           │  │
│  │  Use Stream:             ✓                                                                                                                                                                           │  │
│  │  Ignore EOS:             ✗                                                                                                                                                                           │  │
│  │  Temperature:         0.60                                                                                                                                                                           │  │
│  │  Random Seed:            0                                                                                                                                                                           │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│                                                                                                                                                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

[09:57:51] STAGE    Setting up model paths                                                                                                                                                                    
           INFO     Auto-detected model type: minicpm4                                                                                                                                                        
           INFO     Draft model specified, enabling speculative decoding                                                                                                                                      
           SUCCESS  Setting up model paths (0.00s)                                                                                                                                                            
           INFO     Loaded tokenizer from: /workspace/model                                                                                                                                                   
           STAGE    Creating model instance                                                                                                                                                                   
           INFO     Creating model with Eagle speculative decoding                                                                                                                                            
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
[09:57:52] SUCCESS  Creating model instance (0.33s)                                                                                                                                                           
           INFO     Created model: LLM_with_eagle                                                                                                                                                             
           INFO     Input prompt: '<|im_start|>user\n介绍一下清华大学<|im_end|>\n<|im_start|>assistant\n'                                                                                                     
           INFO     Input tokens: 12                                                                                                                                                                          
           INFO     Initializing model storage...                                                                                                                                                             
           INFO     GPU Memory: 23.5GB total, 21.2GB allocated (90%)                                                                                                                                          
           INFO     Maximum context length under current memory limit: 125568 tokens                                                                                                                          
           INFO     Loading model weights...                                                                                                                                                                  
           INFO     load from /workspace/eagle3_model/pytorch_model.bin                                                                                                                                       
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/cpm.cu/cpmcu/cli.py", line 330, in <module>
    main() 
    ^^^^^^
  File "/workspace/cpm.cu/cpmcu/cli.py", line 316, in main
    run_generation(args)
  File "/workspace/cpm.cu/cpmcu/cli.py", line 285, in run_generation
    llm.load_from_hf()
  File "/workspace/cpm.cu/cpmcu/speculative/tree_drafter.py", line 60, in load_from_hf
    inv_freq, attention_scaling = ROPE_INIT_FUNCTIONS[rope_type](self.config, "cpu", seq_len=self.max_total_length)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 322, in _compute_longrope_parameters
    if factor <= 1.0:
       ^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'
----------------------------------------------------------------------------------------------------------------------
我已经成功的通过使用非eagle3的方式进行推理，效果也很好。在长文本3.5w下。都能够进行很好的推理。因为，要推理的内容非常长，所以我启用了yarn这个配置。但是，长文本理解，消耗的时间，是180秒。这个有些太慢了，所以，我想尝试再加上Eagle3。模型文件是从官方hugging face下载的模型权重。分别是”MiniCPM4.1-8B“，以及”MiniCPM4.1-8B-Eagle3“。我尝试问大模型，产生这个问题的原因：但是，尝试了几种方法都无法解决。期待您们的回复。我的使用场景是招标投标电子平台AI辅助评审，这个场景。评审内容被切分为每组3.5w字。然后，并行评审，最后将评审结果合并。所以，我必须要把运行的效率提升到极致。这样，我所需要的总的GPU资源就很少了。

### Reproduction

script:
python -m cpmcu.cli \
    --model-path /workspace/model  \
    --draft-model-path /workspace/eagle3_model  \
    --prompt-text "介绍一下清华大学" \
    --temperature 0.6 \
    --num-generate 32768 \
    --random-seed 0 \
    --minicpm4-yarn true \
    --block-window-size 2048 \
    --sparse-topk-k 64 \
    --use-eagle3 true 
--------------------------------------------------------------------------------------------------------------


### Output

py312) root@c232e04a1e93:/workspace/cpm.cu# python -m cpmcu.cli \
    --model-path /workspace/model  \
    --draft-model-path /workspace/eagle3_model  \
    --prompt-text "介绍一下清华大学" \
    --temperature 0.6 \
    --num-generate 32768 \
    --random-seed 0 \
    --minicpm4-yarn true \
    --block-window-size 2048 \
    --sparse-topk-k 64 \
    --use-eagle3 true 
╭──────────────────────────────────────────────────────────────────────────────────────────── CLI Configuration ─────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                            │
│  ╭───────────────────────────────────────────────────────────────────────────────────────── System Information ─────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  OS:                       Linux 5.15.0-113-generic                                                                                                                                                  │  │
│  │  Python:                                    3.12.11                                                                                                                                                  │  │
│  │  GPU:                NVIDIA GeForce RTX 4090 (23GB)                                                                                                                                                  │  │
│  │  CUDA:                                         12.6                                                                                                                                                  │  │
│  │  PyTorch:                                     2.7.1                                                                                                                                                  │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Model Configuration ─────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Model Path:                /workspace/model                                                                                                                                                         │  │
│  │  Model Type:                            auto                                                                                                                                                         │  │
│  │  Data Type:                          float16                                                                                                                                                         │  │
│  │  Draft Model:        /workspace/eagle3_model                                                                                                                                                         │  │
│  │  FRSpec Path:                           None                                                                                                                                                         │  │
│  │  MiniCPM4 YARN:                            ✓                                                                                                                                                         │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── System Configuration ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  CUDA Graph:            ✓                                                                                                                                                                            │  │
│  │  Memory Limit:       0.90                                                                                                                                                                            │  │
│  │  Chunk Length:       2048                                                                                                                                                                            │  │
│  │  Plain Output:          ✗                                                                                                                                                                            │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Speculative Decoding ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Window Size:         1024                                                                                                                                                                           │  │
│  │  Iterations:             2                                                                                                                                                                           │  │
│  │  Top-K per Iter:        10                                                                                                                                                                           │  │
│  │  Tree Size:             12                                                                                                                                                                           │  │
│  │  FRSpec Vocab Size:  32768                                                                                                                                                                           │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭──────────────────────────────────────────────────────────────────────────────────────── Prompt Configuration ────────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Prompt File:        ✗                                                                                                                                                                               │  │
│  │  Prompt Text:        ✓                                                                                                                                                                               │  │
│  │  Use Chat Template:  ✓                                                                                                                                                                               │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│  ╭────────────────────────────────────────────────────────────────────────────────────── Generation Configuration ──────────────────────────────────────────────────────────────────────────────────────╮  │
│  │  Max Tokens:         32768                                                                                                                                                                           │  │
│  │  Use Stream:             ✓                                                                                                                                                                           │  │
│  │  Ignore EOS:             ✗                                                                                                                                                                           │  │
│  │  Temperature:         0.60                                                                                                                                                                           │  │
│  │  Random Seed:            0                                                                                                                                                                           │  │
│  ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯  │
│                                                                                                                                                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

[10:13:02] STAGE    Setting up model paths                                                                                                                                                                    
           INFO     Auto-detected model type: minicpm4                                                                                                                                                        
           INFO     Draft model specified, enabling speculative decoding                                                                                                                                      
           SUCCESS  Setting up model paths (0.00s)                                                                                                                                                            
[10:13:03] INFO     Loaded tokenizer from: /workspace/model                                                                                                                                                   
           STAGE    Creating model instance                                                                                                                                                                   
           INFO     Creating model with Eagle speculative decoding                                                                                                                                            
You are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
           SUCCESS  Creating model instance (0.42s)                                                                                                                                                           
           INFO     Created model: LLM_with_eagle                                                                                                                                                             
           INFO     Input prompt: '<|im_start|>user\n介绍一下清华大学<|im_end|>\n<|im_start|>assistant\n'                                                                                                     
           INFO     Input tokens: 12                                                                                                                                                                          
           INFO     Initializing model storage...                                                                                                                                                             
           INFO     GPU Memory: 23.5GB total, 21.2GB allocated (90%)                                                                                                                                          
           INFO     Maximum context length under current memory limit: 125568 tokens                                                                                                                          
           INFO     Applied MiniCPM4 YARN rope_scaling parameters                                                                                                                                             
           INFO     Loading model weights...                                                                                                                                                                  
           INFO     load from /workspace/eagle3_model/pytorch_model.bin                                                                                                                                       
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/workspace/cpm.cu/cpmcu/cli.py", line 330, in <module>
    main() 
    ^^^^^^
  File "/workspace/cpm.cu/cpmcu/cli.py", line 316, in main
    run_generation(args)
  File "/workspace/cpm.cu/cpmcu/cli.py", line 285, in run_generation
    llm.load_from_hf()
  File "/workspace/cpm.cu/cpmcu/speculative/tree_drafter.py", line 60, in load_from_hf
    inv_freq, attention_scaling = ROPE_INIT_FUNCTIONS[rope_type](self.config, "cpu", seq_len=self.max_total_length)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/modeling_rope_utils.py", line 322, in _compute_longrope_parameters
    if factor <= 1.0:
       ^^^^^^^^^^^^^
TypeError: '<=' not supported between instances of 'NoneType' and 'float'
(py312) root@c232e04a1e93:/workspace/cpm.cu# 

### Environment

我的环境是：cuda126_torch271_py312，英伟达24G，4090显卡。Ubuntu 22.4 
---------------------------------------------------------------------------
(py312) root@c232e04a1e93:/workspace/cpm.cu# pip list
Package                  Version      Editable project location
------------------------ ------------ -----------------------------
accelerate               1.12.0
aiohappyeyeballs         2.6.1
aiohttp                  3.13.3
aiosignal                1.4.0
annotated-doc            0.0.4
annotated-types          0.7.0
anyio                    4.12.1
asttokens                3.0.0
attrs                    25.4.0
certifi                  2026.1.4
charset-normalizer       3.4.4
click                    8.3.1
coloredlogs              15.0.1
comm                     0.2.1
cpmcu                    1.0.0
datasets                 4.5.0
debugpy                  1.8.11
decorator                5.1.1
dill                     0.4.0
distro                   1.9.0
einops                   0.8.2
executing                0.8.3
fastapi                  0.128.6
filelock                 3.13.1
flash_attn               2.8.2
flatbuffers              25.2.10
frozenlist               1.8.0
fschat                   0.2.36
fsspec                   2024.6.1
h11                      0.16.0
hf-xet                   1.2.0
httpcore                 1.0.9
httptools                0.7.1
httpx                    0.28.1
huggingface_hub          0.36.2
humanfriendly            10.0
idna                     3.11
infllm_v2                0.0.0        /workspace/infllmv2_cuda_impl
ipykernel                6.29.5
ipython                  9.1.0
ipython_pygments_lexers  1.1.1
jedi                     0.19.2
Jinja2                   3.1.4
jiter                    0.13.0
jupyter_client           8.6.3
jupyter_core             5.8.1
latex2mathml             3.78.1
markdown-it-py           4.0.0
markdown2                2.5.4
MarkupSafe               2.1.5
matplotlib-inline        0.1.6
mdurl                    0.1.2
mpmath                   1.3.0
multidict                6.7.1
multiprocess             0.70.18
nest-asyncio             1.6.0
networkx                 3.3
nh3                      0.3.2
ninja                    1.13.0
numpy                    2.1.2
nvidia-cublas-cu12       12.6.4.1
nvidia-cuda-cupti-cu12   12.6.80
nvidia-cuda-nvrtc-cu12   12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12        9.5.1.17
nvidia-cufft-cu12        11.3.0.4
nvidia-cufile-cu12       1.11.1.6
nvidia-curand-cu12       10.3.7.77
nvidia-cusolver-cu12     11.7.1.2
nvidia-cusparse-cu12     12.5.4.2
nvidia-cusparselt-cu12   0.6.3
nvidia-nccl-cu12         2.26.2
nvidia-nvjitlink-cu12    12.6.85
nvidia-nvtx-cu12         12.6.77
onnxruntime-gpu          1.22.0
openai                   2.18.0
packaging                25.0
pandas                   3.0.0
parso                    0.8.4
pexpect                  4.9.0
pillow                   11.0.0
pip                      25.1
platformdirs             4.3.7
prompt-toolkit           3.0.43
propcache                0.4.1
protobuf                 6.31.1
psutil                   5.9.0
ptyprocess               0.7.0
pure-eval                0.2.2
pyarrow                  23.0.0
pydantic                 2.12.5
pydantic_core            2.41.5
Pygments                 2.19.1
python-dateutil          2.9.0.post0
python-dotenv            1.2.1
PyYAML                   6.0.3
pyzmq                    26.2.0
regex                    2026.1.15
requests                 2.32.5
rich                     14.3.2
safetensors              0.7.0
setuptools               78.1.1
shellingham              1.5.4
shortuuid                1.0.13
six                      1.17.0
sniffio                  1.3.1
stack-data               0.2.0
starlette                0.52.1
svgwrite                 1.4.3
sympy                    1.13.3
tiktoken                 0.12.0
tokenizers               0.22.2
torch                    2.7.1+cu126
torchaudio               2.7.1+cu126
torchvision              0.22.1+cu126
tornado                  6.5.1
tqdm                     4.67.3
traitlets                5.14.3
transformers             4.56.0
triton                   3.3.1
typer-slim               0.21.1
typing_extensions        4.15.0
typing-inspection        0.4.2
urllib3                  2.6.3
uv                       0.8.6
uvicorn                  0.40.0
uvloop                   0.22.1
watchfiles               1.1.1
wavedrom                 2.0.3.post3
wcwidth                  0.2.13
websockets               16.0
wheel                    0.45.1
xxhash                   3.6.0
yarl                     1.22.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 在使用cpm.cu的cli进行推理的时候，为了达到最佳的效率，使用eagle3+yarn。但是，发现无法正常使用。错误信息：TypeError: '<=' not supported between instances of 'NoneType' and 'float' #32

Checklist

Describe the bug

Reproduction

script:
python -m cpmcu.cli
--model-path /workspace/model
--draft-model-path /workspace/eagle3_model
--prompt-text "介绍一下清华大学"
--temperature 0.6
--num-generate 32768
--random-seed 0
--minicpm4-yarn true
--block-window-size 2048
--sparse-topk-k 64
--use-eagle3 true

Output

Environment

我的环境是：cuda126_torch271_py312，英伟达24G，4090显卡。Ubuntu 22.4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] 在使用cpm.cu的cli进行推理的时候，为了达到最佳的效率，使用eagle3+yarn。但是，发现无法正常使用。错误信息：TypeError: '<=' not supported between instances of 'NoneType' and 'float' #32

Description

Checklist

Describe the bug

Reproduction

script: python -m cpmcu.cli --model-path /workspace/model --draft-model-path /workspace/eagle3_model --prompt-text "介绍一下清华大学" --temperature 0.6 --num-generate 32768 --random-seed 0 --minicpm4-yarn true --block-window-size 2048 --sparse-topk-k 64 --use-eagle3 true

Output

Environment

我的环境是：cuda126_torch271_py312，英伟达24G，4090显卡。Ubuntu 22.4

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

script:
python -m cpmcu.cli
--model-path /workspace/model
--draft-model-path /workspace/eagle3_model
--prompt-text "介绍一下清华大学"
--temperature 0.6
--num-generate 32768
--random-seed 0
--minicpm4-yarn true
--block-window-size 2048
--sparse-topk-k 64
--use-eagle3 true