This repository contains a minimal, structured implementation of PPLM + a simplified RLHF proxy adapted from the notebooks in this workspace. The goal is to provide a reproducible experiment harness that:
- Implements a simplified PPLM steering using BoW attribute models.
- Implements a lightweight RLHF proxy: a reward model + small supervised fine-tune on preferred examples.
- Runs experiments comparing Base, PPLM, RLHF, and Hybrid approaches and computes simple quantitative metrics.
Structure
src/: core modules (pplm, rlhf, generator, eval)scripts/: experiment runnerpoetry_dataset.txt: dataset (should be present in repo root)outputs/experiments/results.json: saved experiment outputs
Quick start
- Create a Python environment and install dependencies:
Compact repo summary — tools and data to reproduce and compare three complementary approaches for controlled poetry generation:
- PPLM: inference-time attribute steering (BoW or classifier-based steering)
- RLHF (proxy): supervised fine-tuning + learned reward-model reranking (proxy for full RLHF)
- Hybrid: fine-tuned model + PPLM steering at inference time
This repo converts experiments from notebooks into modular scripts and stores run outputs under outputs-gpu/experiments/.
src/— core library: generator, pplm implementation, reward-model (MLP), evaluation helpersscripts/— CLI runners: training, single-run experiments, batch sweeps, analysismodels-gpu/— (optional, large) model snapshots and reward-model state_dicts (may not be included in all releases)outputs-gpu/experiments/— experiment CSVs, per-run JSONs, aggregated analysis (e.g.,all_runs/master_results.csv,analysis_summary.csv)now question_.txt— short architecture summary + Mermaid diagram (architecture visualization)
- Create an environment and install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt- Run a single experiment (example):
python scripts/run_experiments.py --model-dir models/gpt2_poetry_small --num-prompts 30 --device cpu- Aggregate batch runs (example sweep):
python scripts/run_experiments_all.py --model-dirs "gpt2 models/gpt2_poetry_small" --lora-dirs "poetry-lora-final" --seeds "7 42 123" --num-prompts "30 60 100" --device cuda- Analysis (after runs):
python scripts/analysis_results.py
python scripts/analysis_by_model.py
# open outputs-gpu/experiments/by_model/report.htmlscripts/train_models.py— supervised fine-tune and reward-model trainingscripts/run_experiments.py— generate samples for Base / PPLM / RLHF / Hybrid and compute metricsscripts/run_experiments_all.py— orchestrate parameter sweeps and writeall_runs/master_results.csvscripts/analysis_results.py/scripts/analysis_by_model.py— produce aggregated CSVs, summary stats, and plotsscripts/generate_interactive_step.py— interactive stepwise generator for user feedback
- Base generator: GPT-2 family (default experiments use GPT-2 Small or
gpt2_poetry_smallwhen available) - LoRA adapter defaults: rank=16, alpha=32, dropout=0.1 (adapter label
poetry-lora-finalin outputs) - PPLM default step size: 0.02 (configurable via
--pplm-strength) - Reward model: small MLP over sentence embeddings (default hidden dims [256,128,64])
Note: the exported CSVs reflect runs using these defaults; if you want to run ablation sweeps (LoRA rank, PPLM step sizes, reward architectures, epochs) pass distinct labels/args to run_experiments_all.py so results are recorded.
- Fix random seeds (scripts support seeds 7, 42, 123 by convention).
- Record exact
transformers,tokenizers, andtorchversions when reproducing training runs. - Large model artifacts (safetensors, checkpoints) should be excluded from Git and stored separately (Git LFS or external storage).
- Metrics collected per run:
avg_reward,perplexity,distinct_2,distinct_3(seeall_runs/master_results.csv). - Use
scripts/analysis_results.pyto compute grouped means, std, and to produceanalysis_summary.csv.
- If GPU OOM occurs: use LoRA adapters, gradient accumulation, mixed precision (bf16/AMP), or run on CPU for small tests.
- If pushing to remote fails due to large files, ensure
.gitignoreincludes virtualenvs and model folders and remove large tracked files from the index before pushing.
- Add human preference collection for stronger reward models.
- Run full PPO-based RLHF (e.g., via
trl) for larger-scale experiments. - Add CI, unit tests, and packaged example configs for reproducible ablation runs.
If you want, I can: (a) add a small examples/ folder with a ready-to-run config for the exact experiment that produced analysis_summary.csv, or (b) produce an SVG of the Mermaid diagram and place it in docs/. Tell me which.
