This repository is an experiment-first capstone project for aligning a small open model through staged training and evaluation.
It combines two related tracks:
- Rewrite quality alignment
- Start from base model behavior.
- Train a LoRA adapter with supervised fine-tuning (SFT) on rewrite pairs.
- Improve behavior further with preference optimization (DPO).
- Tool-use benchmarking
- Build a synthetic, single-turn tool-calling dataset.
- Evaluate model outputs with strict JSON/tool/argument matching metrics.
- Adapt raw generation logs into evaluator-compatible format.
In short, this repo is meant to answer:
- How much can small-model behavior improve with simple SFT plus DPO?
- How do we measure tool-call correctness reliably and reproducibly?
-
1.test_inference.py,2.test_inference.py- Early sanity checks for base-model text generation and chat-template behavior.
-
3.check_dataset.py- Verifies rewrite train/eval JSONL files load correctly.
-
4.sft_lora.py- Trains a LoRA adapter with TRL
SFTTraineron rewrite prompt-response pairs. - Saves training checkpoints and final adapter.
- Trains a LoRA adapter with TRL
-
5.compare_before_after.py- Compares base model outputs vs SFT adapter outputs on eval prompts.
-
6.sample_and_score.py- Samples multiple candidate rewrites and ranks them via a simple heuristic reward.
-
7.check_prefs.py- Verifies preference datasets (
prefs.jsonl,prefs_large.jsonl).
- Verifies preference datasets (
-
8.compare_pref_behavior.py- Compares model generations against chosen/rejected preference examples.
-
9.dpo_lora.py- Runs LoRA-based DPO training using preference pairs.
- Includes prompt-prefix consistency checks before training.
-
10.dpo_full_smoke_test.py- Short full-parameter DPO smoke test to validate setup and catch OOM/config issues early.
-
generate_tool_use_dataset.py- Generates balanced tool-use examples for 5 tools: calculator, weather, time, search, reminder.
- Writes split files and docs under
data/tool_use_dataset_v1/.
-
evaluate_tool_use.py- Evaluates predictions against gold tool calls.
- Reports strict metrics such as:
- valid JSON rate
- tool exact match accuracy
- argument exact match accuracy
- strict success rate
- per-tool breakdown and error buckets
-
adapt_predictions.py- Converts diverse raw generation formats into evaluator-ready JSONL.
baseline_tool_use.ipynb,baseline_tool_use_v2.ipynb- Interactive experimentation for tool-use baseline generation/evaluation.
-
data/train.jsonl,data/eval.jsonl- Supervised rewrite dataset used by SFT.
-
data/prefs.jsonl,data/prefs_large.jsonl- Preference pairs (
prompt,chosen,rejected) used by DPO and behavior checks.
- Preference pairs (
-
data/tool_use_dataset_v1/- Tool-use benchmark package:
raw/generated corpusprocessed/split filesdocs/label + split guidancetools_schema.json
- Tool-use benchmark package:
-
outputs/sft-run/,outputs/sft-final/- SFT checkpoints and final LoRA adapter artifacts.
-
outputs/dpo-lora-run/,outputs/dpo-lora-final/- DPO LoRA checkpoints and final adapter artifacts.
-
outputs/dpo-full-smoke/- Smoke-test outputs for full-parameter DPO run.
- Run base sanity checks (
1,2,3). - Train SFT adapter (
4). - Compare base vs SFT (
5) and inspect simple reward ranking (6). - Validate preference data (
7) and inspect preference behavior (8). - Run DPO LoRA training (
9). - Optionally run full DPO smoke test (
10). - For tool-use experiments, generate dataset, adapt predictions, then evaluate.
The scripts assume a GPU-enabled Python environment with packages commonly used here:
torchtransformersdatasetspefttrl
Several scripts use device_map="auto" and fp16 settings, so CUDA availability is expected for practical runtime.
- This repo is script-centric and intentionally iterative; numbered files reflect the progression of the capstone work.
outputs/can become large quickly because checkpoints and tokenizer/model artifacts are stored there.- The
.gitignoreis a general Python template and may need extension if you want to exclude large training artifacts from version control.