Added Batch Integration Benchmarking and Auto Benchmarking Logs#8
Added Batch Integration Benchmarking and Auto Benchmarking Logs#8
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the benchmarking framework with integration-quality metrics, automatic result persistence, and refines the interactive agent testing loop. It also introduces a structured multi-agent configuration and updates dependencies and VCS ignores.
- Add SCIB-based integration metrics and persist benchmark outputs with code snippets.
- Refactor
input_loopto support recursive continuation and graceful exit. - Define agent roles and delegation in a JSON system file; update
.gitignoreand replacescibwithscib-metrics.
Reviewed Changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| benchmarking/sandbox/requirements.txt | Swapped scib for scib-metrics in dependencies. |
| benchmarking/prompt_testing/MultiAgentTester.py | Refactored user input loop; added recursive benchmarking. |
| benchmarking/prompt_testing/MultiAgentAutoTester.py | Implemented JSONL persistence, code‐snippet dumping. |
| benchmarking/auto_metrics/IntegrationMetrics.py | New IntegrationMetric class using scib-metrics. |
| benchmarking/agents/integration_system.json | Configured three specialized agents with delegation rules. |
| benchmarking/agents/AgentSystem.py | Appended strict delegation formatting to prompts. |
| benchmarking/.gitignore | Added *.pyc to ignore Python bytecode files. |
Comments suppressed due to low confidence (1)
benchmarking/auto_metrics/IntegrationMetrics.py:11
AutoMetricis not imported, causing a NameError. Add the appropriate import (e.g.,from benchmarking.auto_metrics.BaseMetric import AutoMetric).
class IntegrationMetric(AutoMetric):
| return "break" | ||
| if user_in.lower() == "benchmark" and benchmark_module: | ||
| run_benchmark(mgr, benchmark_module) | ||
| input_loop() # Recurse to continue the loop after benchmarks |
There was a problem hiding this comment.
The recursive call to input_loop() is missing a return, so the result isn't propagated back and may lead to incorrect control flow. Change to return input_loop().
| input_loop() # Recurse to continue the loop after benchmarks | |
| return input_loop() # Recurse to continue the loop after benchmarks |
| import anndata | ||
| import numpy as np | ||
|
|
There was a problem hiding this comment.
[nitpick] Neither anndata nor numpy are used in this module; consider removing these imports to reduce unused dependencies.
| import anndata | |
| import numpy as np |
| } | ||
| }, | ||
| "integration_expert": { | ||
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python", |
There was a problem hiding this comment.
Typo: remeber should be remember.
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python", | |
| "prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remember to wrap your code in triple backticks and python", |
| full_prompt += f"\n- Command: `{name}`" | ||
| full_prompt += f"\n - Description: {command.description}" | ||
| full_prompt += f"\n - Target Agent: {command.target_agent}" | ||
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." |
There was a problem hiding this comment.
This line is inside the loop that appends each command, causing it to repeat multiple times. Move it outside the loop so it appears just once.
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." | |
| full_prompt += "YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED." |
This pull request introduces significant enhancements to the benchmarking framework, including new agent configurations, integration quality metrics, and benchmarking persistence. Additionally, it includes minor updates to
.gitignoreand dependencies. Below is a breakdown of the most important changes:Enhancements to Benchmarking Framework
IntegrationMetricclass inbenchmarking/auto_metrics/IntegrationMetrics.pyto compute SCIB integration quality metrics (e.g., batch silhouette, cell type silhouette, isolated label F1) usingscib-metrics. This provides a detailed evaluation of single-cell data integration quality.benchmarking/prompt_testing/MultiAgentAutoTester.pyto persist benchmarking results, including metadata, metrics, and code snippets. Results are stored in JSONL format, and code snippets are saved as separate files for reproducibility. [1] [2] [3] [4] [5] [6]Multi-Agent System Enhancements
integration_system.jsonfile to define three agents (master_agent,general_coder,integration_expert) with specialized roles and delegation commands for single-cell analysis tasks. This establishes a clear hierarchy and task delegation mechanism.Codebase Improvements
benchmarking/prompt_testing/MultiAgentTester.pyto handle user input more cleanly, including recursive continuation after benchmarks and graceful exit handling. [1] [2]Miscellaneous Updates
scibwithscib-metricsinbenchmarking/sandbox/requirements.txtto align with the new integration metrics implementation..gitignoreUpdate: Added*.pycto the.gitignorefile to exclude Python bytecode files from version control.These changes collectively enhance the benchmarking system's capabilities, improve maintainability, and ensure better organization of results and agent configurations.