A Python package for reward-based generation with large language models. DEAL provides flexible abstractions for integrating different types of reward models (neural network classifiers or programmatic verifiers) with language model generation.
Note that we were not able to release the official code package for the paper, hence the experimental scaffholding over this code is missing. Instead, this is a ground up implementation of the paper that modifies the code from the ARGS paper's search code.
If you use this code for any decoding work, please consider citing our work.
@inproceedings{huang2025deal,
title={Deal: Decoding-time alignment for large language models},
author={Huang, James Y and Sengupta, Sailik and Bonadiman, Daniele and Lai, Yi-an and Gupta, Arshit and Pappas, Nikolaos and Mansour, Saab and Kirchhoff, Katrin and Roth, Dan},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={26280--26300},
year={2025}
}- Flexible Reward Models: Support for neural network classifiers, regex-based verifiers, and programmatic reward functions
- Built-in Reward Functions: Pre-built reward functions for word count, English words, keywords, text length, and custom combinations
- Multi-Device Support: Generate on one device and score on another for efficient resource utilization
- Caching Support: Optional key-value caching for efficient sequential generation
- Multiple Generation Methods: Greedy and sampling-based approaches
- Well-Tested: Comprehensive pytest-based test suite with 65 passing tests
- Modern Python Packaging: Built with
pyproject.toml, following PEP 517/518 standards - Professional Setup: Includes code quality tools (black, isort, flake8, mypy) and comprehensive documentation
- Clone the repository:
git clone https://github.com/sailik1991/deal.git
cd deal- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the package with development dependencies:
pip install --upgrade pip
pip install -e ".[dev]"This installs the package in editable mode along with all development tools (pytest, black, isort, flake8, mypy).
If you only need to use the package without development tools:
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Or install the package directly
pip install -e .python -c "from deal import DEAL; print('Installation successful.')"import torch
from deal import DEAL, RewardModelClassifier
# Initialize reward model
reward_model = RewardModelClassifier(
model_path="distilbert-base-uncased",
device="cuda:0"
)
# Initialize DEAL
deal = DEAL(
llm_path="gpt2",
reward_model=reward_model,
llm_dev="cuda:0",
rm_dev="cuda:1"
)
# Generate with reward guidance
prompt = "The future of AI is"
output = deal.generate(
prompt,
weight=0.5,
topk=5,
max_new_token=50,
method="greedy"
)
text = deal.tokens_to_text(output)
print(text)import re
from transformers import AutoTokenizer
from deal import DEAL, RegexRewardModel
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Create regex-based reward model
reward_model = RegexRewardModel(
tokenizer=tokenizer,
pattern=r"\b(excellent|great|amazing)\b"
)
deal = DEAL(
llm_path="gpt2",
reward_model=reward_model,
llm_dev="cuda:0"
)
output = deal.generate(
"This is",
weight=1.0,
topk=10,
max_new_token=30
)from transformers import AutoTokenizer
from deal import DEAL, ProgrammaticRewardModel
def custom_reward(text: str) -> float:
"""Reward longer sequences more."""
return min(len(text) / 100.0, 1.0)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
reward_model = ProgrammaticRewardModel(
tokenizer=tokenizer,
reward_fn=custom_reward
)
deal = DEAL(
llm_path="gpt2",
reward_model=reward_model,
llm_dev="cuda:0"
)
output = deal.generate(
"Once upon a time",
weight=0.8,
topk=5,
max_new_token=100
)The package includes several pre-built reward functions for common use cases:
Word Count-based Reward:
from deal import DEAL, ProgrammaticRewardModel, word_count_reward
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Reward text with between 10-20 words
reward_fn = word_count_reward(min_words=10, max_words=20)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)
deal = DEAL(llm_path="gpt2", reward_model=rm, llm_dev="cuda:0")
output = deal.generate("The future", weight=1.0, topk=5, max_new_token=50)English Word Count-based Reward:
from deal import english_word_count_reward
# Reward text with at least 5 English words (filters non-alphabetic tokens)
reward_fn = english_word_count_reward(min_words=5)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)Keyword-based Reward:
from deal import contains_keywords_reward
# Reward text containing specific keywords
reward_fn = contains_keywords_reward(
keywords=["excellent", "amazing", "great"],
require_all=False # Match if any keyword present
)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)Combining Multiple Reward Functions:
from deal import combine_rewards, word_count_reward, contains_keywords_reward
# Combine multiple reward criteria
combined_fn = combine_rewards(
[
word_count_reward(min_words=5),
contains_keywords_reward(keywords=["hello", "world"]),
],
aggregation="mean" # average the two rewards
)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=combined_fn)Main class for reward-guided generation.
Parameters:
llm_path(str): Path to the language modelreward_model(RewardModel): Initialized reward model instancellm_dev(str): Device for the language model (default: "cuda:0")rm_dev(str): Device for the reward model (default: "cuda:1")torch_dtype(torch.dtype): Data type for models (default: torch.float16)
Methods:
generate(prompt, weight=0., topk=1, max_new_token=128, method="greedy", temperature=0.7, chunk_size=5, debug=False): Generate text with reward guidanceget_input_ids(prompt): Tokenize a prompttokens_to_text(tokens): Decode tokens to text
Uses a neural network sequence classifier for rewards.
from deal import RewardModelClassifier
rm = RewardModelClassifier(
model_path="path/to/model",
torch_dtype=torch.float16,
device="cuda:0"
)Uses regex pattern matching on decoded text. Returns 1.0 if pattern matches, 0.0 otherwise.
from deal import RegexRewardModel
rm = RegexRewardModel(
tokenizer=tokenizer,
pattern=r"\b(pattern)\b"
)Structured programmatic reward model with built-in reward functions and optional normalization.
from deal import ProgrammaticRewardModel, word_count_reward
rm = ProgrammaticRewardModel(
tokenizer=tokenizer,
reward_fn=word_count_reward(min_words=5, max_words=20),
normalize=True,
min_reward=0.0,
max_reward=1.0
)Available Built-in Reward Functions:
word_count_reward(min_words, max_words, target_words, tolerance)- Reward based on word count thresholdsenglish_word_count_reward(min_words, max_words, common_words_only)- Reward based on English word count, optionally filtering to common wordscontains_keywords_reward(keywords, require_all, case_sensitive)- Reward based on keyword presencetext_length_reward(min_length, max_length)- Reward based on character lengthcombine_rewards(reward_fns, weights, aggregation)- Combine multiple reward functions with mean/min/max/product aggregation
First, activate your virtual environment if not already active:
source venv/bin/activate # On Windows: venv\Scripts\activatepytest
# Open htmlcov/index.html to view coverage reportTo run a specific test file, use:
pytest tests/test_deal.py -vTo run specific tests using labels/markers, use:
# Run only unit tests
pytest -m unit
# Run only integration tests
pytest -m integration
# Skip slow tests
pytest -m "not slow"This package includes a comprehensive test suite with 65 passing tests:
- test_deal.py: 56 tests covering DEAL class and reward models
- Tests for RewardModelClassifier, RegexRewardModel, and ProgrammaticRewardModel
- Tests for built-in programmatic reward functions
Coverage report generated as HTML in htmlcov/ directory after running tests with --cov flag.
deal/
├── src/ # Source code
│ ├── __init__.py # Package initialization
│ └── deal.py # DEAL class and reward models
├── tests/ # Test suite
│ ├── __init__.py
│ └── test_deal.py # Tests for DEAL
├── pyproject.toml # Project metadata and configuration (PEP 517/518)
├── requirements.txt # Runtime dependencies
├── requirements-dev.txt # Development dependencies
├── pytest.ini # Pytest configuration
├── .gitignore # Git ignore patterns
├── README.md # This file
└── venv/ # Virtual environment (created locally)
# Activate virtual environment
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install with development dependencies
pip install -e ".[dev]"The project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
Format your code before committing (with venv activated):
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/Add tests to the tests/ directory following the naming convention test_*.py:
import pytest
from deal import DEAL
def test_my_feature():
"""Test description."""
# Your test code here
assert TrueMark tests with appropriate markers:
@pytest.mark.unit
def test_unit_feature():
pass
@pytest.mark.integration
def test_integration_feature():
pass
@pytest.mark.slow
def test_slow_operation():
passclass DEAL:
def __init__(
self,
llm_path: str,
reward_model: RewardModel,
llm_dev: str = "cuda:0",
rm_dev: str = "cuda:1",
torch_dtype: torch.dtype = torch.float16
)
def generate(
self,
prompt: str,
weight: float = 0.,
topk: int = 1,
max_new_token: int = 128,
method: str = "greedy",
temperature: float = 0.7,
chunk_size: int | str = 5,
debug: bool = False
) -> torch.Tensor
def get_input_ids(self, prompt: str) -> torch.Tensor
def tokens_to_text(self, tokens: torch.Tensor) -> List[str]class RewardModel(ABC):
@abstractmethod
def __call__(
self,
input_ids: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
past_key_values = None,
use_cache: bool = False
) -> Any
@abstractmethod
def eval(self)
@abstractmethod
def to(self, device: str)class ProgrammaticRewardModel(RewardModel):
def __init__(
self,
tokenizer,
reward_fn: Callable[[str], float],
normalize: bool = False,
min_reward: float = 0.0,
max_reward: float = 1.0
)Parameters:
tokenizer: Tokenizer for converting tokens to textreward_fn: Callable that takes decoded text and returns reward score (float)normalize: If True, clip rewards to [min_reward, max_reward] rangemin_reward: Minimum reward value (used if normalize=True)max_reward: Maximum reward value (used if normalize=True)
def word_count_reward(
min_words: Optional[int] = None,
max_words: Optional[int] = None,
target_words: Optional[int] = None,
tolerance: int = 0
) -> Callable[[str], float]Reward text based on word count. Returns 1.0 if count is within range, 0.0 otherwise.
def english_word_count_reward(
min_words: Optional[int] = None,
max_words: Optional[int] = None,
common_words_only: bool = False
) -> Callable[[str], float]Reward text based on English word count (alphabetic tokens). Optionally filters to common English words.
def contains_keywords_reward(
keywords: List[str],
require_all: bool = False,
case_sensitive: bool = False
) -> Callable[[str], float]Reward text based on keyword presence. Returns 1.0 if all (or any) keywords found depending on require_all, proportional reward for partial matches.
def text_length_reward(
min_length: Optional[int] = None,
max_length: Optional[int] = None
) -> Callable[[str], float]Reward text based on character length. Returns 1.0 if length is within range, 0.0 otherwise.
def combine_rewards(
reward_fns: List[Callable[[str], float]],
weights: Optional[List[float]] = None,
aggregation: str = "mean"
) -> Callable[[str], float]Combine multiple reward functions. Supports "mean", "min", "max", and "product" aggregation methods.
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Built with:
Reduce chunk_size parameter or decrease topk:
deal.generate(
prompt,
topk=3, # Reduce from default
chunk_size=5, # Reduce from default
)Set the pad token for your tokenizer:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_tokenFor issues, questions, or suggestions, please open an issue on GitHub.