Added Batch Integration Benchmarking and Auto Benchmarking Logs by djriffle · Pull Request #8 · OpenTechBio/Olaf

djriffle · 2025-07-08T18:16:38Z

This pull request introduces significant enhancements to the benchmarking framework, including new agent configurations, integration quality metrics, and benchmarking persistence. Additionally, it includes minor updates to .gitignore and dependencies. Below is a breakdown of the most important changes:

Enhancements to Benchmarking Framework

Integration Quality Metrics: Added a new IntegrationMetric class in benchmarking/auto_metrics/IntegrationMetrics.py to compute SCIB integration quality metrics (e.g., batch silhouette, cell type silhouette, isolated label F1) using scib-metrics. This provides a detailed evaluation of single-cell data integration quality.
Benchmarking Persistence: Introduced functionality in benchmarking/prompt_testing/MultiAgentAutoTester.py to persist benchmarking results, including metadata, metrics, and code snippets. Results are stored in JSONL format, and code snippets are saved as separate files for reproducibility. [1] [2] [3] [4] [5] [6]

Multi-Agent System Enhancements

Agent Configuration: Added a new integration_system.json file to define three agents (master_agent, general_coder, integration_expert) with specialized roles and delegation commands for single-cell analysis tasks. This establishes a clear hierarchy and task delegation mechanism.

Codebase Improvements

Input Loop Refactoring: Refactored the input loop in benchmarking/prompt_testing/MultiAgentTester.py to handle user input more cleanly, including recursive continuation after benchmarks and graceful exit handling. [1] [2]

Miscellaneous Updates

Dependency Update: Replaced scib with scib-metrics in benchmarking/sandbox/requirements.txt to align with the new integration metrics implementation.
.gitignore Update: Added *.pyc to the .gitignore file to exclude Python bytecode files from version control.

These changes collectively enhance the benchmarking system's capabilities, improve maintainability, and ensure better organization of results and agent configurations.

Copilot

Pull Request Overview

This PR enhances the benchmarking framework with integration-quality metrics, automatic result persistence, and refines the interactive agent testing loop. It also introduces a structured multi-agent configuration and updates dependencies and VCS ignores.

Add SCIB-based integration metrics and persist benchmark outputs with code snippets.
Refactor input_loop to support recursive continuation and graceful exit.
Define agent roles and delegation in a JSON system file; update .gitignore and replace scib with scib-metrics.

Reviewed Changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
benchmarking/sandbox/requirements.txt	Swapped `scib` for `scib-metrics` in dependencies.
benchmarking/prompt_testing/MultiAgentTester.py	Refactored user input loop; added recursive benchmarking.
benchmarking/prompt_testing/MultiAgentAutoTester.py	Implemented JSONL persistence, code‐snippet dumping.
benchmarking/auto_metrics/IntegrationMetrics.py	New `IntegrationMetric` class using `scib-metrics`.
benchmarking/agents/integration_system.json	Configured three specialized agents with delegation rules.
benchmarking/agents/AgentSystem.py	Appended strict delegation formatting to prompts.
benchmarking/.gitignore	Added `*.pyc` to ignore Python bytecode files.

Comments suppressed due to low confidence (1)

benchmarking/auto_metrics/IntegrationMetrics.py:11

AutoMetric is not imported, causing a NameError. Add the appropriate import (e.g., from benchmarking.auto_metrics.BaseMetric import AutoMetric).

class IntegrationMetric(AutoMetric):

Copilot · 2025-07-08T18:19:07Z

benchmarking/prompt_testing/MultiAgentTester.py

+                return "break"
+            if user_in.lower() == "benchmark" and benchmark_module:
+                run_benchmark(mgr, benchmark_module)
+                input_loop()  # Recurse to continue the loop after benchmarks


The recursive call to input_loop() is missing a return, so the result isn't propagated back and may lead to incorrect control flow. Change to return input_loop().

Suggested change

input_loop() # Recurse to continue the loop after benchmarks

return input_loop() # Recurse to continue the loop after benchmarks

Copilot · 2025-07-08T18:19:07Z

benchmarking/auto_metrics/IntegrationMetrics.py

+import anndata
+import numpy as np
+


[nitpick] Neither anndata nor numpy are used in this module; consider removing these imports to reduce unused dependencies.

Suggested change

import anndata

import numpy as np

Copilot · 2025-07-08T18:19:07Z

benchmarking/agents/integration_system.json

+      }
+    },
+    "integration_expert": {
+      "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n    adata,\n    n_top_genes=2000,\n    subset=True,\n    layer='counts',\n    flavor='seurat_v3',\n    batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python",


Typo: remeber should be remember.

Suggested change

"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python",

"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remember to wrap your code in triple backticks and python",

Copilot · 2025-07-08T18:19:08Z

benchmarking/agents/AgentSystem.py

                full_prompt += f"\n- Command: `{name}`"
                full_prompt += f"\n  - Description: {command.description}"
                full_prompt += f"\n  - Target Agent: {command.target_agent}"
+            full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."


This line is inside the loop that appends each command, causing it to repeat multiple times. Move it outside the loop so it appears just once.

Suggested change

full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."

full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."

djriffle added 4 commits July 8, 2025 11:13

Added integration benchmarks, a dataset, and agent system

4aff422

Merge branch 'main' into IntegrationBenchmarking

754d06b

fix scib requirements

533d079

Improved Integration Metrics and Record Keeping

f66607f

djriffle requested a review from Copilot July 8, 2025 18:17

Copilot AI reviewed Jul 8, 2025

View reviewed changes

djriffle closed this Jul 8, 2025

djriffle deleted the IntegrationBenchmarking branch July 8, 2025 19:03

djriffle restored the IntegrationBenchmarking branch July 8, 2025 19:11

djriffle reopened this Jul 8, 2025

djriffle merged commit 0112634 into main Jul 8, 2025
3 checks passed

djriffle deleted the IntegrationBenchmarking branch July 9, 2025 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Batch Integration Benchmarking and Auto Benchmarking Logs#8

Added Batch Integration Benchmarking and Auto Benchmarking Logs#8
djriffle merged 4 commits intomainfrom
IntegrationBenchmarking

djriffle commented Jul 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Copilot AI Jul 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	input_loop() # Recurse to continue the loop after benchmarks
	return input_loop() # Recurse to continue the loop after benchmarks

	full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."
	full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED."

Conversation

djriffle commented Jul 8, 2025

Enhancements to Benchmarking Framework

Multi-Agent System Enhancements

Codebase Improvements

Miscellaneous Updates

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants