Interactive benchmark tooling by djriffle · Pull Request #3 · OpenTechBio/Olaf

djriffle · 2025-05-22T20:32:06Z

This pull request introduces a new interactive testing framework for benchmarking AI agents, supporting both Docker and Singularity backends. Key changes include the addition of a unified interactive tester, a Singularity sandbox manager, and a Singularity definition file for container setup.

New Interactive Testing Framework:

InteractiveAgentTester.py: Added a unified interactive tester that supports both Docker and Singularity backends. It includes features like multi-turn GPT orchestration, FastAPI kernel execution, resource uploads, and an interactive chat loop. The backend is selected at runtime, and the implementation adapts to the chosen backend.

Singularity Support:

Singularity: Added a Singularity definition file to create a container image (sandbox.sif). It sets up the environment, installs dependencies, and configures a non-root user for running the sandbox.
benchmarking_sandbox_management_singularity.py: Introduced a Singularity sandbox manager to handle container lifecycle operations (build, start, stop, status, logs). It provides a Docker-free alternative for running the benchmarking sandbox.

Copilot

Pull Request Overview

This PR introduces an interactive benchmarking framework that supports both Docker and Singularity backends. It adds a unified interactive agent tester, a new Singularity sandbox manager, and a Singularity definition file for container image setup.

Added a Singularity sandbox manager that replaces Docker for container lifecycle operations.
Created a new Singularity definition file to build the container image.
Updated the interactive agent tester to conditionally import and use Docker or Singularity backends.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
benchmarking/sandbox/benchmarking_sandbox_management_singularity.py	Adds Singularity-specific container lifecycle methods and REPL support.
benchmarking/sandbox/Singularity	Provides a Singularity definition file for building the container image.
benchmarking/InteractiveAgentTester.py	Implements backend selection logic for interactive benchmarking.

Copilot · 2025-05-22T20:32:26Z

benchmarking/sandbox/Singularity

+NB_USER="sandboxuser"
+NB_UID=1001
+NB_GID=1001
+su - =${NB_USER} # USER=${NB_USER}


The command 'su - =${NB_USER}' appears to include an extraneous '=' which likely causes an error. Please update it to 'su - ${NB_USER}'.

Suggested change

su - =${NB_USER} # USER=${NB_USER}

su - ${NB_USER} # USER=${NB_USER}

djriffle added 2 commits May 22, 2025 15:15

Created an Interactive Agent Script

40f8160

Adding Singularity Support

3a7f9f9

djriffle requested a review from Copilot May 22, 2025 20:32

Copilot AI reviewed May 22, 2025

View reviewed changes

djriffle merged commit a9365ac into main May 22, 2025
1 of 2 checks passed

djriffle deleted the InteractiveBenchmarkTooling branch June 2, 2025 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interactive benchmark tooling#3

Interactive benchmark tooling#3
djriffle merged 2 commits intomainfrom
InteractiveBenchmarkTooling

djriffle commented May 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	su - =${NB_USER} # USER=${NB_USER}
	su - ${NB_USER} # USER=${NB_USER}

Conversation

djriffle commented May 22, 2025

New Interactive Testing Framework:

Singularity Support:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants