Installation

To install the required dependencies, run:

pip install -r requirements.txt

Generation Command Example

Run the following command to generate results:

python main.py --dataset humaneval --signature --provider_and_model openai:gpt-3.5-turbo-0125 --flow basic --range full --output_path evaluation/basic_gpt35_turbo.jsonl

Evaluation Command Example

To evaluate functional correctness, use:

python evaluation/evaluate_functional_correctness.py \
    --problem_file evaluation/data/HumanEval.jsonl.gz \
    --sample_file evaluation/basic_gpt35_turbo.jsonl

Flow Options

Available flow options:

basic
AC
ACT
debugger
ac_debugger
act_debugger

Dataset (`problem_file`) Options

Available datasets:

HumanEval.jsonl.gz
HumanEvalPlus.jsonl.gz

LLM Options

Table summarizing the LLMs utilized in this study is presented below.

Model endpoints for inference:

Hugging Face Endpoints

HuggingFace:HuggingFaceH4/zephyr-7b-beta
HuggingFace:Qwen/Qwen2.5-Coder-32B-Instruct
HuggingFace:meta-llama/Meta-Llama-3-8B-Instruct
HuggingFace:Qwen/QwQ-32B-Preview
HuggingFace:microsoft/Phi-3.5-mini-instruct
HuggingFace:mistralai/Mistral-7B-Instruct-v0.2

Deepseek (Requires `--api_key`)

deepseek:deepseek-chat

OpenAI

openai:gpt-3.5-turbo-0125
openai:gpt-4o-mini
openai:gpt-4o

Anthropic (Requires `--api_key`)

anthropic:claude-3-haiku-20240307
anthropic:claude-3-5-sonnet-20241022
anthropic:claude-3-5-haiku-20241022

Groq

groq:llama-3.3-70b-versatile
groq:llama-3.1-8b-instant
groq:gemma2-9b-it
groq:mixtral-8x7b-32768

Vertex

vertex:gemini-2.0-flash-exp
vertex:gemini-1.0-pro

Acknowledgement

Our implementation adapts code from LDB and prompt ideas from both LDB and Self-collaboration Code Generation via ChatGPT. We thank them for their high-quality open source code!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
data		data
evaluation		evaluation
flows		flows
graphflows		graphflows
readme_images		readme_images
staticfg		staticfg
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
langgraph.json		langgraph.json
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Generation Command Example

Evaluation Command Example

Flow Options

Dataset (`problem_file`) Options

LLM Options

Hugging Face Endpoints

Deepseek (Requires `--api_key`)

OpenAI

Anthropic (Requires `--api_key`)

Groq

Vertex

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Installation

Generation Command Example

Evaluation Command Example

Flow Options

Dataset (problem_file) Options

LLM Options

Hugging Face Endpoints

Deepseek (Requires --api_key)

OpenAI

Anthropic (Requires --api_key)

Groq

Vertex

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Dataset (`problem_file`) Options

Deepseek (Requires `--api_key`)

Anthropic (Requires `--api_key`)

Packages