Skip to content

Interactive benchmark tooling#3

Merged
djriffle merged 2 commits intomainfrom
InteractiveBenchmarkTooling
May 22, 2025
Merged

Interactive benchmark tooling#3
djriffle merged 2 commits intomainfrom
InteractiveBenchmarkTooling

Conversation

@djriffle
Copy link
Member

This pull request introduces a new interactive testing framework for benchmarking AI agents, supporting both Docker and Singularity backends. Key changes include the addition of a unified interactive tester, a Singularity sandbox manager, and a Singularity definition file for container setup.

New Interactive Testing Framework:

  • InteractiveAgentTester.py: Added a unified interactive tester that supports both Docker and Singularity backends. It includes features like multi-turn GPT orchestration, FastAPI kernel execution, resource uploads, and an interactive chat loop. The backend is selected at runtime, and the implementation adapts to the chosen backend.

Singularity Support:

  • Singularity: Added a Singularity definition file to create a container image (sandbox.sif). It sets up the environment, installs dependencies, and configures a non-root user for running the sandbox.
  • benchmarking_sandbox_management_singularity.py: Introduced a Singularity sandbox manager to handle container lifecycle operations (build, start, stop, status, logs). It provides a Docker-free alternative for running the benchmarking sandbox.

@djriffle djriffle requested a review from Copilot May 22, 2025 20:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an interactive benchmarking framework that supports both Docker and Singularity backends. It adds a unified interactive agent tester, a new Singularity sandbox manager, and a Singularity definition file for container image setup.

  • Added a Singularity sandbox manager that replaces Docker for container lifecycle operations.
  • Created a new Singularity definition file to build the container image.
  • Updated the interactive agent tester to conditionally import and use Docker or Singularity backends.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
benchmarking/sandbox/benchmarking_sandbox_management_singularity.py Adds Singularity-specific container lifecycle methods and REPL support.
benchmarking/sandbox/Singularity Provides a Singularity definition file for building the container image.
benchmarking/InteractiveAgentTester.py Implements backend selection logic for interactive benchmarking.

NB_USER="sandboxuser"
NB_UID=1001
NB_GID=1001
su - =${NB_USER} # USER=${NB_USER}
Copy link

Copilot AI May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command 'su - =${NB_USER}' appears to include an extraneous '=' which likely causes an error. Please update it to 'su - ${NB_USER}'.

Suggested change
su - =${NB_USER} # USER=${NB_USER}
su - ${NB_USER} # USER=${NB_USER}

Copilot uses AI. Check for mistakes.
@djriffle djriffle merged commit a9365ac into main May 22, 2025
1 of 2 checks passed
@djriffle djriffle deleted the InteractiveBenchmarkTooling branch June 2, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants