This tool performs policy-based content moderation using OpenAI's GPT-OSS model running on vLLM with the Harmony response format.
- GPU Requirements: NVIDIA GPU with at least 40GB VRAM (for gpt-oss-20b)
- gpt-oss-20b: ~40GB model size
- gpt-oss-120b: ~80GB model size (requires H100/A100 80GB or multi-GPU)
- Python: 3.12 (recommended by OpenAI for gpt-oss models)
- CUDA: 12.8 (nightly PyTorch required)
- System: glibc >= 2.32 (Ubuntu 20.04+ or equivalent)
- Hugging Face Token: Required for model access
For gpt-oss-20b:
- GPU: NVIDIA A100, H100, (I have not tesed higher (B200) or lower (A10) spec chips)
- RAM: 32GB system memory
- Storage: 100GB (for model weights and dependencies)
For gpt-oss-120b:
- GPU: NVIDIA H100 (80GB+) or 2x A100 (40GB each)
- RAM: 80GB system memory
- Storage: 150GB
# Update system packages
sudo apt update && sudo apt upgrade -y
# Install Python 3.12
sudo apt install python3.12 python3.12-venv python3-pip -y
# Probably beed a reboot after this. You will get logged out.
sudo reboot
# Verify NVIDIA drivers and CUDA
nvidia-smi# Create project directory
mkdir -p ~/policyevals
cd ~/policyevals
# Upload policy_evals_harmony.py and requirements.txt to this directoryThe uv package manager is recommended by OpenAI for managing Python environments with vLLM:
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.cargo/env# Create virtual environment with uv
uv venv --python 3.12 --seed
source .venv/bin/activateCRITICAL: You must use a very specific vLLM build for gptoss:
uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-matchThis will automatically install:
- vLLM 0.10.1+gptoss (special build for gpt-oss models)
- PyTorch nightly with CUDA 12.8 support
- All required dependencies
pip install -r requirements.txtThis installs:
openai-harmony>=0.0.4(Harmony response format renderer)huggingface-hub>=0.20.0(model authentication)
# Set your Hugging Face token
export HF_TOKEN="your_huggingface_token_here"
# To make it persistent, add to ~/.bashrc
echo 'export HF_TOKEN="your_huggingface_token_here"' >> ~/.bashrcpython policy_evals_harmony.py \
--policy path/to/policy.txt \
--evals path/to/evals.txt \
--output results.jsonl \
--model openai/gpt-oss-20b| Option | Required | Default | Description |
|---|---|---|---|
--policy |
Yes | - | Path to the policy definition file |
--evals |
Yes | - | Path to evaluation prompts file (one per line) |
--output |
Yes | - | Output JSONL file path for results |
--model |
No | openai/gpt-oss-20b |
Hugging Face model ID |
# Run evaluation with default model
python policy_evals_harmony.py \
--policy ./policies/hate_speech.txt \
--evals ./data/test_prompts.txt \
--output ./results/hate_speech_results.jsonl
# Use a different model
python policy_evals_harmony.py \
--policy ./policies/violence.txt \
--evals ./data/prompts.txt \
--output ./results/violence_results.jsonl \
--model "openai/gpt-oss-20b"Plain text file containing the moderation policy definition:
This policy prohibits content that promotes or glorifies violence against individuals or groups.
Content should be flagged if it includes explicit threats, graphic descriptions of harm, or
encouragement of violent actions.
Plain text file with one user prompt per line:
How do I make cookies?
Tell me how to harm someone
What's the weather like today?
Results are written to a JSONL file (one JSON object per line):
{
"index": 0,
"content": "How do I make cookies?",
"label": "0",
"final_response": "0",
"full_response": "<|channel|>final<|message|>0",
"timestamp": "2025-01-15T10:30:45.123456"
}Label meanings:
"1": Policy applies (content violates policy)"0": Policy does not apply (content is acceptable)
Edit policy_evals_harmony.py line 41 to adjust GPU memory usage:
gpu_memory_utilization=0.8, # Use 80% of GPU memory (adjust between 0.7-0.95)Edit line 42 to change max sequence length:
max_model_len=4096, # Adjust based on your needs and GPU memoryFor multi-GPU setups, edit line 44:
tensor_parallel_size=1 # Set to number of GPUs