Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR introduces an interactive benchmarking framework that supports both Docker and Singularity backends. It adds a unified interactive agent tester, a new Singularity sandbox manager, and a Singularity definition file for container image setup.
- Added a Singularity sandbox manager that replaces Docker for container lifecycle operations.
- Created a new Singularity definition file to build the container image.
- Updated the interactive agent tester to conditionally import and use Docker or Singularity backends.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| benchmarking/sandbox/benchmarking_sandbox_management_singularity.py | Adds Singularity-specific container lifecycle methods and REPL support. |
| benchmarking/sandbox/Singularity | Provides a Singularity definition file for building the container image. |
| benchmarking/InteractiveAgentTester.py | Implements backend selection logic for interactive benchmarking. |
| NB_USER="sandboxuser" | ||
| NB_UID=1001 | ||
| NB_GID=1001 | ||
| su - =${NB_USER} # USER=${NB_USER} |
There was a problem hiding this comment.
The command 'su - =${NB_USER}' appears to include an extraneous '=' which likely causes an error. Please update it to 'su - ${NB_USER}'.
Suggested change
| su - =${NB_USER} # USER=${NB_USER} | |
| su - ${NB_USER} # USER=${NB_USER} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new interactive testing framework for benchmarking AI agents, supporting both Docker and Singularity backends. Key changes include the addition of a unified interactive tester, a Singularity sandbox manager, and a Singularity definition file for container setup.
New Interactive Testing Framework:
InteractiveAgentTester.py: Added a unified interactive tester that supports both Docker and Singularity backends. It includes features like multi-turn GPT orchestration, FastAPI kernel execution, resource uploads, and an interactive chat loop. The backend is selected at runtime, and the implementation adapts to the chosen backend.Singularity Support:
Singularity: Added a Singularity definition file to create a container image (sandbox.sif). It sets up the environment, installs dependencies, and configures a non-root user for running the sandbox.benchmarking_sandbox_management_singularity.py: Introduced a Singularity sandbox manager to handle container lifecycle operations (build, start, stop, status, logs). It provides a Docker-free alternative for running the benchmarking sandbox.