|
for _ in range(reviewer_max_rounds): |
run_reviewer_and_finalize 模块中, reviewer_max_rounds 没有起到作用,verify_steps_batch 之后就被break了,Reviewer阶段缺少“补充验证后再次审阅”的反馈闭环。作者可以看一下这里有没有问题。
|
if reviewer_decision.get("need_more_steps"): |
|
pending_key_steps = reviewer_next_steps or [] |
|
if not pending_key_steps: |
|
print( |
|
f"[{task_name}] No valid steps available to address reviewer issues. Stopping reviewer loop." |
|
) |
|
break |
|
verify_steps_batch(pending_key_steps) |
|
break |
OS-Themis/trajectory_reward/completion_finalize.py
Line 63 in 9da51bb
run_reviewer_and_finalize模块中,reviewer_max_rounds没有起到作用,verify_steps_batch之后就被break了,Reviewer阶段缺少“补充验证后再次审阅”的反馈闭环。作者可以看一下这里有没有问题。OS-Themis/trajectory_reward/completion_finalize.py
Lines 189 to 197 in 9da51bb