Add configurable grading rubrics#62
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 668aa2337c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "text": ( | ||
| finalized.notice | ||
| if finalized.blocked | ||
| else f"\n\n{finalized.notice}" | ||
| ), |
There was a problem hiding this comment.
Preserve block semantics for streamed API responses
When agent.rubric.on_failure = "block" is used with /api/agent/stream, the original response_delta events have already been yielded before this final notice is emitted, so any client that reconstructs the answer by concatenating deltas receives the disallowed draft plus the block notice rather than a replacement. The non-streaming invoke path returns only finalized.response, but this streaming path does not provide a way to retract or replace the prior deltas.
Useful? React with 👍 / 👎.
| else: | ||
| await cl.Message(content=finalized.notice, author="System").send() |
There was a problem hiding this comment.
Replace the Chainlit response when blocking
If the response has already been streamed in Chainlit and the rubric finishes with on_failure = "block", this branch only sends a separate System notice and leaves bridge.response_buffer/the assistant message containing the unapproved answer; finish() then exports and updates that original response. In streamed Chainlit runs this means the block policy does not actually replace the final assistant response.
Useful? React with 👍 / 👎.
| DEFAULT_RUBRIC_MAX_ITERATIONS, | ||
| ) | ||
| if ( | ||
| not isinstance(raw_max_iterations, int) |
There was a problem hiding this comment.
Reject boolean max_iterations values
TOML booleans pass this check because bool is a subclass of int in Python, so max_iterations = true is accepted during config load and stored as True. When rubric grading is enabled, RubricMiddleware rejects boolean max_iterations at agent construction time, turning a config validation issue into a runtime startup/invocation failure.
Useful? React with 👍 / 👎.
- configure active rubric to load prompts/scientific-rubric.md - add ToolUniverse scientific tool-use rubric prompt - delete obsolete skill directory - disable async researcher, repo researcher, and related Chainlit commands
Summary
Testing