DEAL

A Python package for reward-based generation with large language models. DEAL provides flexible abstractions for integrating different types of reward models (neural network classifiers or programmatic verifiers) with language model generation.

Note that we were not able to release the official code package for the paper, hence the experimental scaffholding over this code is missing. Instead, this is a ground up implementation of the paper that modifies the code from the ARGS paper's search code.

If you use this code for any decoding work, please consider citing our work.

@inproceedings{huang2025deal,
  title={Deal: Decoding-time alignment for large language models},
  author={Huang, James Y and Sengupta, Sailik and Bonadiman, Daniele and Lai, Yi-an and Gupta, Arshit and Pappas, Nikolaos and Mansour, Saab and Kirchhoff, Katrin and Roth, Dan},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={26280--26300},
  year={2025}
}

Features

Flexible Reward Models: Support for neural network classifiers, regex-based verifiers, and programmatic reward functions
Built-in Reward Functions: Pre-built reward functions for word count, English words, keywords, text length, and custom combinations
Multi-Device Support: Generate on one device and score on another for efficient resource utilization
Caching Support: Optional key-value caching for efficient sequential generation
Multiple Generation Methods: Greedy and sampling-based approaches
Well-Tested: Comprehensive pytest-based test suite with 65 passing tests
Modern Python Packaging: Built with pyproject.toml, following PEP 517/518 standards
Professional Setup: Includes code quality tools (black, isort, flake8, mypy) and comprehensive documentation

Installation

Development Setup (Recommended)

Clone the repository:

git clone https://github.com/sailik1991/deal.git
cd deal

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package with development dependencies:

pip install --upgrade pip
pip install -e ".[dev]"

This installs the package in editable mode along with all development tools (pytest, black, isort, flake8, mypy).

Production Installation

If you only need to use the package without development tools:

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Or install the package directly
pip install -e .

Verify Installation

python -c "from deal import DEAL; print('Installation successful.')"

Quick Start

Using DEAL with a Classifier Reward Model

import torch
from deal import DEAL, RewardModelClassifier

# Initialize reward model
reward_model = RewardModelClassifier(
    model_path="distilbert-base-uncased",
    device="cuda:0"
)

# Initialize DEAL
deal = DEAL(
    llm_path="gpt2",
    reward_model=reward_model,
    llm_dev="cuda:0",
    rm_dev="cuda:1"
)

# Generate with reward guidance
prompt = "The future of AI is"
output = deal.generate(
    prompt,
    weight=0.5,
    topk=5,
    max_new_token=50,
    method="greedy"
)

text = deal.tokens_to_text(output)
print(text)

Using DEAL with a Regex Reward Model

import re
from transformers import AutoTokenizer
from deal import DEAL, RegexRewardModel

tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Create regex-based reward model
reward_model = RegexRewardModel(
    tokenizer=tokenizer,
    pattern=r"\b(excellent|great|amazing)\b"
)

deal = DEAL(
    llm_path="gpt2",
    reward_model=reward_model,
    llm_dev="cuda:0"
)

output = deal.generate(
    "This is",
    weight=1.0,
    topk=10,
    max_new_token=30
)

Using DEAL with a Custom Function Reward Model

from transformers import AutoTokenizer
from deal import DEAL, ProgrammaticRewardModel

def custom_reward(text: str) -> float:
    """Reward longer sequences more."""
    return min(len(text) / 100.0, 1.0)

tokenizer = AutoTokenizer.from_pretrained("gpt2")
reward_model = ProgrammaticRewardModel(
    tokenizer=tokenizer,
    reward_fn=custom_reward
)

deal = DEAL(
    llm_path="gpt2",
    reward_model=reward_model,
    llm_dev="cuda:0"
)

output = deal.generate(
    "Once upon a time",
    weight=0.8,
    topk=5,
    max_new_token=100
)

Using DEAL with Built-in Programmatic Reward Functions

The package includes several pre-built reward functions for common use cases:

Word Count-based Reward:

from deal import DEAL, ProgrammaticRewardModel, word_count_reward
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Reward text with between 10-20 words
reward_fn = word_count_reward(min_words=10, max_words=20)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)

deal = DEAL(llm_path="gpt2", reward_model=rm, llm_dev="cuda:0")
output = deal.generate("The future", weight=1.0, topk=5, max_new_token=50)

English Word Count-based Reward:

from deal import english_word_count_reward

# Reward text with at least 5 English words (filters non-alphabetic tokens)
reward_fn = english_word_count_reward(min_words=5)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)

Keyword-based Reward:

from deal import contains_keywords_reward

# Reward text containing specific keywords
reward_fn = contains_keywords_reward(
    keywords=["excellent", "amazing", "great"],
    require_all=False  # Match if any keyword present
)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=reward_fn)

Combining Multiple Reward Functions:

from deal import combine_rewards, word_count_reward, contains_keywords_reward

# Combine multiple reward criteria
combined_fn = combine_rewards(
    [
        word_count_reward(min_words=5),
        contains_keywords_reward(keywords=["hello", "world"]),
    ],
    aggregation="mean"  # average the two rewards
)
rm = ProgrammaticRewardModel(tokenizer=tokenizer, reward_fn=combined_fn)

Usage

DEAL Class

Main class for reward-guided generation.

Parameters:

llm_path (str): Path to the language model
reward_model (RewardModel): Initialized reward model instance
llm_dev (str): Device for the language model (default: "cuda:0")
rm_dev (str): Device for the reward model (default: "cuda:1")
torch_dtype (torch.dtype): Data type for models (default: torch.float16)

Methods:

generate(prompt, weight=0., topk=1, max_new_token=128, method="greedy", temperature=0.7, chunk_size=5, debug=False): Generate text with reward guidance
get_input_ids(prompt): Tokenize a prompt
tokens_to_text(tokens): Decode tokens to text

Reward Models

RewardModelClassifier

Uses a neural network sequence classifier for rewards.

from deal import RewardModelClassifier

rm = RewardModelClassifier(
    model_path="path/to/model",
    torch_dtype=torch.float16,
    device="cuda:0"
)

RegexRewardModel

Uses regex pattern matching on decoded text. Returns 1.0 if pattern matches, 0.0 otherwise.

from deal import RegexRewardModel

rm = RegexRewardModel(
    tokenizer=tokenizer,
    pattern=r"\b(pattern)\b"
)

ProgrammaticRewardModel

Structured programmatic reward model with built-in reward functions and optional normalization.

from deal import ProgrammaticRewardModel, word_count_reward

rm = ProgrammaticRewardModel(
    tokenizer=tokenizer,
    reward_fn=word_count_reward(min_words=5, max_words=20),
    normalize=True,
    min_reward=0.0,
    max_reward=1.0
)

Available Built-in Reward Functions:

word_count_reward(min_words, max_words, target_words, tolerance) - Reward based on word count thresholds
english_word_count_reward(min_words, max_words, common_words_only) - Reward based on English word count, optionally filtering to common words
contains_keywords_reward(keywords, require_all, case_sensitive) - Reward based on keyword presence
text_length_reward(min_length, max_length) - Reward based on character length
combine_rewards(reward_fns, weights, aggregation) - Combine multiple reward functions with mean/min/max/product aggregation

Running Tests

First, activate your virtual environment if not already active:

source venv/bin/activate  # On Windows: venv\Scripts\activate

Run All Tests

pytest
# Open htmlcov/index.html to view coverage report

To run a specific test file, use:

pytest tests/test_deal.py -v

To run specific tests using labels/markers, use:

# Run only unit tests
pytest -m unit

# Run only integration tests
pytest -m integration

# Skip slow tests
pytest -m "not slow"

Test Results

This package includes a comprehensive test suite with 65 passing tests:

test_deal.py: 56 tests covering DEAL class and reward models
- Tests for RewardModelClassifier, RegexRewardModel, and ProgrammaticRewardModel
- Tests for built-in programmatic reward functions

Coverage report generated as HTML in htmlcov/ directory after running tests with --cov flag.

Project Structure

deal/
├── src/                          # Source code
│   ├── __init__.py              # Package initialization
│   └── deal.py                  # DEAL class and reward models
├── tests/                        # Test suite
│   ├── __init__.py
│   └── test_deal.py             # Tests for DEAL
├── pyproject.toml               # Project metadata and configuration (PEP 517/518)
├── requirements.txt             # Runtime dependencies
├── requirements-dev.txt         # Development dependencies
├── pytest.ini                   # Pytest configuration
├── .gitignore                   # Git ignore patterns
├── README.md                    # This file
└── venv/                        # Virtual environment (created locally)

Development

Setup Development Environment

# Activate virtual environment
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install with development dependencies
pip install -e ".[dev]"

Code Style

The project uses:

Black for code formatting
isort for import sorting
flake8 for linting
mypy for type checking

Format your code before committing (with venv activated):

black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/

Adding New Tests

Add tests to the tests/ directory following the naming convention test_*.py:

import pytest
from deal import DEAL

def test_my_feature():
    """Test description."""
    # Your test code here
    assert True

Mark tests with appropriate markers:

@pytest.mark.unit
def test_unit_feature():
    pass

@pytest.mark.integration
def test_integration_feature():
    pass

@pytest.mark.slow
def test_slow_operation():
    pass

API Reference

Core Classes

class DEAL:
    def __init__(
        self,
        llm_path: str,
        reward_model: RewardModel,
        llm_dev: str = "cuda:0",
        rm_dev: str = "cuda:1",
        torch_dtype: torch.dtype = torch.float16
    )

    def generate(
        self,
        prompt: str,
        weight: float = 0.,
        topk: int = 1,
        max_new_token: int = 128,
        method: str = "greedy",
        temperature: float = 0.7,
        chunk_size: int | str = 5,
        debug: bool = False
    ) -> torch.Tensor

    def get_input_ids(self, prompt: str) -> torch.Tensor
    def tokens_to_text(self, tokens: torch.Tensor) -> List[str]

RewardModel (Abstract)

class RewardModel(ABC):
    @abstractmethod
    def __call__(
        self,
        input_ids: torch.Tensor,
        attention_mask: Optional[torch.Tensor] = None,
        past_key_values = None,
        use_cache: bool = False
    ) -> Any

    @abstractmethod
    def eval(self)

    @abstractmethod
    def to(self, device: str)

ProgrammaticRewardModel

class ProgrammaticRewardModel(RewardModel):
    def __init__(
        self,
        tokenizer,
        reward_fn: Callable[[str], float],
        normalize: bool = False,
        min_reward: float = 0.0,
        max_reward: float = 1.0
    )

Parameters:

tokenizer: Tokenizer for converting tokens to text
reward_fn: Callable that takes decoded text and returns reward score (float)
normalize: If True, clip rewards to [min_reward, max_reward] range
min_reward: Minimum reward value (used if normalize=True)
max_reward: Maximum reward value (used if normalize=True)

Built-in Reward Functions

word_count_reward

def word_count_reward(
    min_words: Optional[int] = None,
    max_words: Optional[int] = None,
    target_words: Optional[int] = None,
    tolerance: int = 0
) -> Callable[[str], float]

Reward text based on word count. Returns 1.0 if count is within range, 0.0 otherwise.

english_word_count_reward

def english_word_count_reward(
    min_words: Optional[int] = None,
    max_words: Optional[int] = None,
    common_words_only: bool = False
) -> Callable[[str], float]

Reward text based on English word count (alphabetic tokens). Optionally filters to common English words.

contains_keywords_reward

def contains_keywords_reward(
    keywords: List[str],
    require_all: bool = False,
    case_sensitive: bool = False
) -> Callable[[str], float]

Reward text based on keyword presence. Returns 1.0 if all (or any) keywords found depending on require_all, proportional reward for partial matches.

text_length_reward

def text_length_reward(
    min_length: Optional[int] = None,
    max_length: Optional[int] = None
) -> Callable[[str], float]

Reward text based on character length. Returns 1.0 if length is within range, 0.0 otherwise.

combine_rewards

def combine_rewards(
    reward_fns: List[Callable[[str], float]],
    weights: Optional[List[float]] = None,
    aggregation: str = "mean"
) -> Callable[[str], float]

Combine multiple reward functions. Supports "mean", "min", "max", and "product" aggregation methods.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with:

Troubleshooting

CUDA Out of Memory

Reduce chunk_size parameter or decrease topk:

deal.generate(
    prompt,
    topk=3,        # Reduce from default
    chunk_size=5,  # Reduce from default
)

Tokenizer Issues

Set the pad token for your tokenizer:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
tests		tests
.gitignore		.gitignore
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

DEAL - Decoding-time alignment for large language models

Features

Installation

Development Setup (Recommended)

Production Installation

Verify Installation

Quick Start

Using DEAL with a Classifier Reward Model

Using DEAL with a Regex Reward Model

Using DEAL with a Custom Function Reward Model

Using DEAL with Built-in Programmatic Reward Functions

Usage

DEAL Class

Reward Models

RewardModelClassifier

RegexRewardModel

ProgrammaticRewardModel

Running Tests

Run All Tests

Test Results

Project Structure

Development

Setup Development Environment

Code Style

Adding New Tests

API Reference

Core Classes

DEAL

RewardModel (Abstract)

ProgrammaticRewardModel

Built-in Reward Functions

word_count_reward

english_word_count_reward

contains_keywords_reward

text_length_reward

combine_rewards

Contributing

License

Acknowledgments

Troubleshooting

CUDA Out of Memory

Tokenizer Issues

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages