Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added Agent Benchmarking
This pull request introduces a benchmarking framework for evaluating AI-generated code in the context of single-cell transcriptomics data analysis. It includes the implementation of a new evaluation script, dataset metadata, and supporting infrastructure, as well as updates to documentation and configuration files.
New Features and Functionality:
Evaluation Script: Added
Evaluator.py, which provides functionality to evaluate AI-generated code using OpenAI's API. It includes helper functions for formatting conversations, sending evaluation requests, and processing datasets. The script supports interactive usage and integrates with thedotenvlibrary for API key management.Dataset Metadata: Introduced a new dataset metadata file,
spatial_transcriptomics_in_mouse_puck_191109_14.json, which includes details such as citation, dataset ID, and cell count. This file is part of the benchmarking datasets.Configuration and Setup:
Environment Configuration Script: Added
create_benchmark_env.sh, a script to securely prompt for and save the OpenAI API key into a.envfile. It ensures proper file permissions for security..gitignoreUpdates: Updated.gitignoreto exclude.env,__pycache__/,.DS_store, andoutputs/to prevent sensitive or unnecessary files from being tracked.Requirements File: Added
requirements.txtwith dependencies such asopenai,rich,docker, andcellxgene-censusto support the benchmarking framework.Documentation:
README.mdfile outlining the purpose, setup, and usage of the benchmarking framework. It includes instructions for dataset management, sandbox setup, and running the evaluation process.