Skip to content

Add configurable grading rubrics#62

Open
AminMahpour wants to merge 3 commits into
masterfrom
dev/configurable-grading-rubrics
Open

Add configurable grading rubrics#62
AminMahpour wants to merge 3 commits into
masterfrom
dev/configurable-grading-rubrics

Conversation

@AminMahpour

Copy link
Copy Markdown
Owner

Summary

  • add rubric configuration through the agent runtime, CLI, API, and TUI
  • update DeepAgent, Chainlit, and LangGraph configuration surfaces to use the new setting
  • add coverage for CLI, API, stream events, TUI, and RAG/runtime behavior

Testing

  • Added and updated unit tests across the affected entrypoints and runtime paths

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 668aa2337c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread chainagents_api.py
Comment on lines +196 to +200
"text": (
finalized.notice
if finalized.blocked
else f"\n\n{finalized.notice}"
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve block semantics for streamed API responses

When agent.rubric.on_failure = "block" is used with /api/agent/stream, the original response_delta events have already been yielded before this final notice is emitted, so any client that reconstructs the answer by concatenating deltas receives the disallowed draft plus the block notice rather than a replacement. The non-streaming invoke path returns only finalized.response, but this streaming path does not provide a way to retract or replace the prior deltas.

Useful? React with 👍 / 👎.

Comment thread main.py
Comment on lines +1555 to +1556
else:
await cl.Message(content=finalized.notice, author="System").send()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace the Chainlit response when blocking

If the response has already been streamed in Chainlit and the rubric finishes with on_failure = "block", this branch only sends a separate System notice and leaves bridge.response_buffer/the assistant message containing the unapproved answer; finish() then exports and updates that original response. In streamed Chainlit runs this means the block policy does not actually replace the final assistant response.

Useful? React with 👍 / 👎.

Comment thread deepagent_runtime.py
DEFAULT_RUBRIC_MAX_ITERATIONS,
)
if (
not isinstance(raw_max_iterations, int)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject boolean max_iterations values

TOML booleans pass this check because bool is a subclass of int in Python, so max_iterations = true is accepted during config load and stored as True. When rubric grading is enabled, RubricMiddleware rejects boolean max_iterations at agent construction time, turning a config validation issue into a runtime startup/invocation failure.

Useful? React with 👍 / 👎.

- configure active rubric to load prompts/scientific-rubric.md

- add ToolUniverse scientific tool-use rubric prompt

- delete obsolete skill directory

- disable async researcher, repo researcher, and related Chainlit commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant