Curriculum-Guided Dynamic Policy Optimization

Framework

Overall Framework of ChipExplore

Setup

1. LLaMA Factory Environment (for SFT)

Supervised Fine-Tuning (SFT) relies on the LLaMA Factory library. Please follow their official installation instructions to set up the environment:

LLaMA Factory Repository: https://github.com/hiyouga/LLaMA-Factory

Ensure you have the necessary dependencies installed as per their guide.

Reasoning cold start data is located in data zip file.

2. CDPO Environment (For RL)

b. Training Setup

We mainly relies on Verl to build up the CDPO algorithm. Please install with the docker image.

docker create --runtime=nvidia --gpus all --net=host --shm-size="10g" --cap-add=SYS_ADMIN -v .:/workspace/verl --name verl <image:tag> sleep infinity
docker start verl
docker exec -it verl bash

b. EDA Tool Docker Containers:

Build the necessary Docker containers for simulation (Icarus Verilog) and synthesis/PPA analysis (Yosys, OpenROAD):

Build Icarus Verilog Docker:

docker build -t iverilog -f docker/iverilog/Dockerfile .

Build EDA Environment Docker (Yosys, OpenROAD):

docker build -t eda_env -f docker/eda/Dockerfile .

Ensure Docker is running on your system before executing these commands.

Dataset Format

Example data formats used for training and evaluation can be found in the data/ directory:

SFT Data: See data/sft_dataset_example.json for the instruction/output format used in supervised fine-tuning.
CDPO/RTLLM Data: See data/r1_dataset_example.json for the format including problem descriptions, gold solutions, testbenches, and PPA metrics used in CDPO and RTLLM evaluation.

Training

1. Supervised Fine-Tuning (SFT)

SFT is performed using LLaMA Factory.

Configuration: The SFT training configuration is defined in recipes/Qwen2.5-7B-Instruct/sft/verilog_sft.yaml. Modify this file to change dataset paths, hyperparameters, etc.

Run Training: Execute the training using the llamafactory-cli. Adjust CUDA_VISIBLE_DEVICES as needed.

CUDA_VISIBLE_DEVICES=0,1,2,3 FORCE_TORCHRUN=1 llamafactory-cli train recipe/sft/qwen2.5-coder-7B-full_sft.yaml

2. CDPO Training

Please ensure the path of base models and datasets are correct in the shell scripts.

Train CodeQwen

./recipe/cdpo/test_cdpo_7b_verilog_codeqwen.sh

Train Qwen2.5-Coder

./recipe/cdpo/test_cdpo_7b_verilog_qwen.sh

Train DeepSeek-Coder

./recipe/cdpo/test_cdpo_7b_verilog_deepseekcoder.sh

Train CodeLlama

./recipe/cdpo/test_cdpo_7b_verilog_codellama.sh

Evaluation

Evaluation scripts are located in the src/evaluation/ directory. Ensure the Open R1 environment (Python packages and Docker containers) is set up as described earlier.

1. RTLLM Benchmark

Generate Verilog code solutions for the RTLLM benchmark problems.

Script: src/evaluation/test_on_RTLLM.py

Usage:

python src/evaluation/test_on_RTLLM.py \
  --model <path_to_your_trained_model_or_hf_name> \
  --n <number_of_samples_per_problem> \
  --temperature <sampling_temperature> \
  --gpu_ids <gpu_ids_to_use> \
  --benchmark_path benchmark/rtllm_benchmark.json
  # Add other arguments like --lora_path if needed

Example:

python src/evaluation/test_on_RTLLM.py --model /root_extends/model/Qwen2.5-Coder-7B-Verilog-sft --n 10 --temperature 0.7 --gpu_ids 0

This will generate a .jsonl file in the generated_code/ directory containing the generated solutions.

2. VerilogEval Benchmark

Generate Verilog code solutions for the VerilogEval benchmark problems (Human or Machine generated).

Script: src/evaluation/test_on_verilogeval_vllm.py

Usage:

python src/evaluation/test_on_verilogeval_vllm.py \
  --model <path_to_your_trained_model_or_hf_name> \
  --bench_type <Human_or_Machine> \
  --n <number_of_samples_per_problem> \
  --temperature <sampling_temperature> \
  --gpu_ids <gpu_ids_to_use> \
  # Add other arguments as needed

Example:

python src/evaluation/test_on_verilogeval_vllm.py --model /root_extends/model/Qwen2.5-Coder-7B-Verilog-sft --bench_type Human --n 10 --temperature 0.7 --gpu_ids 0

This will also generate a .jsonl file in the generated_code/ directory.

3. PPA Verification (for RTLLM)

python src/evaluation/benchmark_model_ppa.py --bench_type <bench_type>

bench_type should be in [ppa, power, area, delay, area_delay, power_delay]

Name		Name	Last commit message	Last commit date
Latest commit History 1,041 Commits
.gemini		.gemini
.github		.github
.vscode		.vscode
assets		assets
benchmark		benchmark
data		data
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
src		src
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curriculum-Guided Dynamic Policy Optimization

Framework

Setup

1. LLaMA Factory Environment (for SFT)

2. CDPO Environment (For RL)

Dataset Format

Training

1. Supervised Fine-Tuning (SFT)

2. CDPO Training

Evaluation

1. RTLLM Benchmark

2. VerilogEval Benchmark

3. PPA Verification (for RTLLM)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Curriculum-Guided Dynamic Policy Optimization

Framework

Setup

1. LLaMA Factory Environment (for SFT)

2. CDPO Environment (For RL)

Dataset Format

Training

1. Supervised Fine-Tuning (SFT)

2. CDPO Training

Evaluation

1. RTLLM Benchmark

2. VerilogEval Benchmark

3. PPA Verification (for RTLLM)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages