Skip to content

Janhutter/LLM-test-time-adaptation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

STTRL: Sample Wise Test-Time Reinforcement Learning

Paper (coming soon)

📖Introduction

We expand the framework of test time reinforcement (TTRL) to a single sample adaptation paradigm and perform a comprehensive study on various datasets and models while compairing with other single sample adaptation methods.

📜 Abstract

💡 How it Works

STTRL operates at test time, treating the generation of a solution as a sequential decision-making problem. For each test sample, we use a reward model to guide a policy (the language model) towards a better solution through reinforcement learning. This allows the model to refine its internal reasoning process on-the-fly, customized for the specific complexities of the question at hand.

Key Features:

  • Sample-Wise Adaptation: The model's generation strategy is optimized for each unique test instance.
  • Model-Agnostic: Can be applied to a wide range of autoregressive language models.
  • Improved Reasoning: Boosts performance on tasks requiring complex, multi-step thought processes.

📊 Main Results

✨ Getting Started

Follow these steps to set up the environment and reproduce our results.

All experiments were conducted on a machine with 4 x NVIDIA H100 80GB GPUs.

⚙️ Installation

  1. Clone the repository:
    git clone https://github.com/your-username/sttrl.git
    cd sttrl
  2. Create and activate the Conda environment:
    conda env create -f environment.yaml
    conda activate sttrl

🚀 Running Experiments

All scripts should be run from the verl/ directory.

cd verl

Running STTRL (Our Method)

The main script is main.py. You can specify the dataset, model, and other parameters.

Base Command Template:

python main.py --dataset <DATASET_NAME> --model <MODEL_NAME> --voting_function majority

Example: To reproduce our results on the AIME-TTT dataset with the Qwen/Qwen2.5-Math-1.5B model:

python main.py --dataset AIME-TTT --model Qwen/Qwen2.5-Math-1.5B --voting_function majority

Supported Datasets (--dataset):

  • AIME-TTT
  • AMC-TTT
  • MATH-TTT
  • GSM8K-TTT
  • GPQA-TTT

Supported Models (--model):

  • Qwen/Qwen2.5-Math-1.5B
  • Qwen/Qwen2.5-Math-7B
  • Qwen/Qwen2.5-7B
  • meta-llama/Llama-3.1-8B-Instruct

Note for Non-Math Models: When using general-purpose models like Qwen/Qwen2.5-7B or Llama-3.1-8B-Instruct, we found the following flags improve performance:

python main.py \
    --dataset AIME-TTT \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --voting_function majority \
    --val_temp 0.6 \
    --separate_ref_gpu

Supported Voting Functions (--voting_function):

  • majority
  • energy
  • confidence

Running TTFT (Our Method)

Simply run with the following settings, for single sample and continuous adaptation respectively.

python main.py \
    --dataset AIME-TTT \
    --model Qwen/Qwen2.5-Math-1.5B \
    --voting_function majority \
    --adaptation_method ttft \
    --lr 0.0001 \
python main.py \
    --dataset AIME-TTT \
    --model Qwen/Qwen2.5-Math-1.5B \
    --voting_function majority \
    --adaptation_method ttft \
    --continuous \
    --lr 5e-5 \

Running Baselines

To reproduce the baseline results, set the --adaptation_method flag.

Example (SLOT):

python main.py \
    --dataset AIME-TTT \
    --model Qwen/Qwen2.5-Math-1.5B \
    --voting_function majority \
    --adaptation_method slot

Example (MEMO):

python main.py \
    --dataset AIME-TTT \
    --model Qwen/Qwen2.5-Math-1.5B \
    --voting_function majority \
    --adaptation_method memo

Running TTRL (original code)

To run the original TTRL method, use the following command:

cd examples/ttrl/<MODEL_NAME>
bash aime.sh

Here, the <MODEL_NAME> can be replaced with the desired model, such as Math-1.5B or Math-7B. The folders contain scripts for different datasets as well.

✒️ Citation

About

Code for research on test-time adaptation in large vision-language models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors