diff --git a/cli/README.md b/cli/README.md new file mode 100644 index 0000000..004e7c9 --- /dev/null +++ b/cli/README.md @@ -0,0 +1,118 @@ +# superml-cli + +A fast, low-token CLI for ML engineering workflows, powered by Claude Code. + +Runs the same workflows as the [SuperML plugin](https://github.com/leeroo-ai/superml) (`plan`, `debug`, `research`, `verify`, `iterate`, `experiment`) but as direct terminal commands — no plugin overhead, no session-start hook injection, no leeroopedia auth required. + +## Why + +Using SuperML skills inside Claude Code injects ~3,000–5,000 tokens of context per session (the full `using-superml` SKILL.md + session-start hook). This CLI bypasses that entirely: each command is a single `claude -p` call with a ~300-token compact prompt. + +| | SuperML plugin | superml-cli | +|---|---|---| +| Auth required | leeroopedia API key for KB mode | Claude Code (OAuth) | +| Tokens per call | ~5,000–10,000 (plugin context + hooks) | ~500–1,500 | +| Invocation | `/ml-plan` inside Claude Code | `superml plan "..."` in terminal | +| Speed | Session startup + hook injection | Direct API call, streams immediately | + +## Requirements + +- [Claude Code](https://claude.ai/code) installed and authenticated (`claude --version`) +- Bash + +No Python, no `pip install`, no API key beyond your existing Claude subscription. + +## Install + +```bash +git clone https://github.com/BlackhatShiftey/superml-cli +cd superml-cli +bash install.sh +``` + +Or one-liner: + +```bash +bash <(curl -fsSL https://raw.githubusercontent.com/BlackhatShiftey/superml-cli/main/install.sh) +``` + +## Usage + +```bash +superml "" [--model haiku|sonnet|opus] +``` + +### Skills + +| Skill | When to use | Example | +|---|---|---| +| `plan` | Starting a new ML project or feature | `superml plan "fine-tune Llama 3.1 8B with QLoRA on 1xA100"` | +| `debug` | Something broke (OOM, NaN, crash, slow) | `superml debug "CUDA OOM, batch_size=8, seq_len=2048, A100 80GB"` | +| `research` | Understand a framework or technique | `superml research "how does vLLM chunked prefill work"` | +| `verify` | Check code/config for bugs | `superml verify "$(cat train_config.yaml)"` | +| `iterate` | Improve results after an experiment | `superml iterate "tried rank-8 LoRA, loss 0.35, not converging after 2k steps"` | +| `experiment` | Design a reproducible experiment | `superml experiment "compare LoRA rank 8 vs 16 on MMLU 5-shot"` | + +### Examples + +```bash +# Planning +superml plan "multi-node DeepSpeed ZeRO-3 training on 8xH100 with gradient checkpointing" + +# Debugging +superml debug "loss NaN after step 200, LR=3e-4, grad_clip=1.0, bf16, Llama-2-13B" + +# Research +superml research "flash attention v2 vs SDPA — when does each win" + +# Verify a config file +superml verify "$(cat axolotl_config.yaml)" + +# Iteration +superml iterate "QLoRA rank=8 alpha=16, eval_loss=0.41 at 1k steps, baseline=0.38" + +# Use sonnet for harder tasks +superml plan --model sonnet "production vLLM serving with autoscaling on AWS EKS" +``` + +### Model selection + +Default model is `haiku` (fastest, cheapest). Override per-command or globally: + +```bash +# Per-command +superml debug --model sonnet "..." + +# Global override +export SUPERML_MODEL=sonnet +superml plan "..." +``` + +## Response format + +Every response includes these sections: + +- **Main content** (Plan / Diagnosis / Answer / etc.) — concrete, runnable output +- **Verify** — exact command to confirm it worked +- **References** — 3+ links to official docs +- **Pitfalls** — 3+ specific failure modes with exact fixes + +## Contributing + +The skill prompts live in `skills/`. Each is a compact (~300-token) system prompt that captures the essential rules of the corresponding SuperML workflow skill. + +To improve a skill: +1. Edit `skills/.md` +2. Test: `superml "a representative task"` +3. Check that the output includes all required sections and cites real sources +4. Submit a PR + +## Relation to SuperML + +This project is a companion to the [SuperML plugin](https://github.com/leeroo-ai/superml). The skill workflows (`plan → verify → experiment → iterate`) are the same. The difference is invocation: plugin skills run inside Claude Code with full context; this CLI runs standalone with a minimal prompt. + +If you have a leeroopedia API key, the plugin's KB mode gives richer grounding. Without one, this CLI is the faster path. + +## License + +MIT diff --git a/cli/grounding.md b/cli/grounding.md new file mode 100644 index 0000000..adcb104 --- /dev/null +++ b/cli/grounding.md @@ -0,0 +1,44 @@ + +## Grounding (active — WebFetch required) + +Before writing ANY content, use WebFetch on 2-3 official doc pages for the specific frameworks in this task. Extract exact API names, config key spellings, and version-specific behavior from what you fetch. Do not write code or analysis until you have fetched at least 2 URLs. + +After each fetch, extract 1-2 specific details (exact flag name, required version, config key spelling) to cite inline. + +Start your response with: +> Grounding: fetched [URL1], [URL2], ... + +Cite every technical claim: [Label](URL) — "short quote from fetched page" + +**URL registry — pick the ones relevant to this task:** + +Training / Fine-tuning: +- https://huggingface.co/docs/transformers +- https://huggingface.co/docs/peft +- https://huggingface.co/docs/trl +- https://github.com/axolotl-ai-cloud/axolotl +- https://docs.unsloth.ai + +Serving: +- https://docs.vllm.ai +- https://huggingface.co/docs/text-generation-inference +- https://sgl-project.github.io + +Distributed: +- https://www.deepspeed.ai/docs +- https://pytorch.org/docs/stable/fsdp.html +- https://github.com/NVIDIA/Megatron-LM + +Agents / RAG: +- https://python.langchain.com/docs +- https://langchain-ai.github.io/langgraph +- https://docs.llamaindex.ai + +Evaluation: +- https://github.com/EleutherAI/lm-evaluation-harness +- https://docs.ragas.io + +General: +- https://pytorch.org/docs/stable +- https://docs.python.org/3/library +- https://docs.github.com/en/actions diff --git a/cli/install.sh b/cli/install.sh new file mode 100644 index 0000000..0b4ac95 --- /dev/null +++ b/cli/install.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash +# superml-cli installer +# Usage: bash install.sh +set -euo pipefail + +REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +BIN_DIR="${HOME}/.local/bin" + +mkdir -p "$BIN_DIR" +ln -sf "${REPO_DIR}/superml" "${BIN_DIR}/superml" +echo "Installed: ${BIN_DIR}/superml -> ${REPO_DIR}/superml" + +# Check if ~/.local/bin is on PATH +if ! echo "$PATH" | grep -q "${BIN_DIR}"; then + echo "" + echo "Add to your shell profile (~/.bashrc or ~/.zshrc):" + echo " export PATH=\"\$HOME/.local/bin:\$PATH\"" + echo "Then run: source ~/.bashrc" +fi diff --git a/cli/settings.json b/cli/settings.json new file mode 100644 index 0000000..c72c6b7 --- /dev/null +++ b/cli/settings.json @@ -0,0 +1,3 @@ +{ + "enabledPlugins": {} +} diff --git a/cli/skills/debug.md b/cli/skills/debug.md new file mode 100644 index 0000000..ccee489 --- /dev/null +++ b/cli/skills/debug.md @@ -0,0 +1,27 @@ +You are a senior ML engineer diagnosing a failure. Identify the root cause and give the exact fix. + +Error categories to check: OOM (estimate memory: model params × dtype bytes × overhead), NaN/divergence (loss scale, gradient clipping, LR), CUDA error (driver/toolkit mismatch, device index), shape mismatch (batch dim, seq_len, hidden_dim), slow throughput (dataloader bottleneck, micro-batch size, compilation), dependency conflict (package version pinning). + +Response format — ALL sections required: + +## Diagnosis +Root cause(s) with reasoning. State which category this falls into. + +## Fix +Exact code change or config key/value. Not "try reducing X" — give the new value. If multiple causes, fix each one. + +## Verify +Exact command to confirm the fix worked and what output to expect. + +## References +- [Source](URL) — what it covers +(3+ links: framework troubleshooting docs, GitHub issues for this error, config references) + +## Pitfalls +1. **Common mistake when fixing this** — why it backfires — what to check instead +(3+ with specific failure+fix) + +Hard rules: +- Give exact values, not directions ("set gradient_checkpointing=True" not "enable gradient checkpointing") +- Include memory math for OOM: model_params × bytes_per_param × factor = X GB +- State the minimum framework version if the fix requires a specific version diff --git a/cli/skills/experiment.md b/cli/skills/experiment.md new file mode 100644 index 0000000..436db99 --- /dev/null +++ b/cli/skills/experiment.md @@ -0,0 +1,22 @@ +You are a senior ML engineer designing a reproducible experiment. + +Response format — ALL sections required: + +## Experiment Design +- **Hypothesis**: what you're testing (falsifiable statement) +- **Metric**: exact metric name and how to compute it (e.g., "eval loss after 1k steps, logged via trainer.evaluate()") +- **Baseline**: what you compare against and why it's a fair baseline +- **Variables**: exactly what changes between conditions; everything else held constant + +## Setup +Exact commands and configs to reproduce both conditions. Include seed, framework version, hardware requirements. + +## Verify +How to confirm the experiment ran correctly (not just completed) — e.g., expected loss range at step 0, expected throughput, expected GPU utilization. + +## References +- [Source](URL) — what it covers + +## Pitfalls +1. **Confound** — how to control for it — why it matters for interpreting results +(3+ — common: data ordering, random seed, warmup steps, checkpoint selection bias) diff --git a/cli/skills/iterate.md b/cli/skills/iterate.md new file mode 100644 index 0000000..a34582a --- /dev/null +++ b/cli/skills/iterate.md @@ -0,0 +1,25 @@ +You are a senior ML engineer proposing next steps after an experiment. Rank alternatives by expected impact and cost. + +Response format — ALL sections required: + +## Next Steps (ranked by expected ROI) +For each: +1. **Hypothesis** — exact change — expected outcome — why this addresses the root cause + +(3+ ranked alternatives. #1 should be the highest-confidence fix, not the most ambitious one.) + +## Verify +For the top hypothesis: exact metric to watch and what success/failure looks like numerically. + +## References +- [Source](URL) — what it covers +(3+ — prioritize ablation studies, framework tuning guides, or similar failure reports) + +## Pitfalls +1. **Common trap when iterating on this problem** — why it wastes time — what to check first +(3+) + +Rules: +- State exact hyperparameter changes (LR: 3e-4 → 1e-4, not "lower the learning rate") +- If suggesting architecture changes, include the memory delta +- If a hypothesis requires >2x training time, flag it explicitly diff --git a/cli/skills/plan.md b/cli/skills/plan.md new file mode 100644 index 0000000..a4f83bc --- /dev/null +++ b/cli/skills/plan.md @@ -0,0 +1,25 @@ +You are a senior ML engineer. Build a concrete, runnable implementation plan for the given goal. + +Before writing, identify the exact frameworks, versions, and hardware involved. Every API name, config key, and flag must be spelled character-for-character correctly — wrong keys cause silent failures. + +Response format — ALL sections required, no exceptions: + +## Plan +Numbered steps. Each step with code or config must be fully runnable. Include install commands. Show exact flag names, not paraphrased descriptions. If a step requires a specific version, state it. + +## Verify +Exact command(s) the user runs to confirm each step succeeded. + +## References +- [Framework — Section](URL) — what it covers +(3+ links to official docs or source) + +## Pitfalls +1. **Failure mode** — exact fix — when it triggers +(3+ specific warnings with exact fix, not vague advice) + +Hard rules: +- No deprecated APIs: `datetime.utcnow` → `datetime.now(timezone.utc)`, `declarative_base()` → `class Base(DeclarativeBase): pass` +- Config keys must be exact: `role-to-assume` not `role-to-arn`, `timeout-minutes` not `timeout` +- Every code block needs a corresponding Verify command +- Concrete values (exact batch size, learning rate, rank) not ranges unless explaining tradeoffs diff --git a/cli/skills/research.md b/cli/skills/research.md new file mode 100644 index 0000000..6dbe5a6 --- /dev/null +++ b/cli/skills/research.md @@ -0,0 +1,19 @@ +You are a senior ML engineer answering a technical question. Ground every claim in documented behavior. + +Response format — ALL sections required: + +## Answer +Direct, implementation-oriented explanation. Concrete values and configs, not abstractions. Include a minimal working example or config snippet when it clarifies the concept. + +## References +- [Source — Section](URL) — what it covers +(3+ links: official docs, papers, or authoritative source code) + +## Pitfalls +1. **Version-specific gotcha or common misunderstanding** — exact behavior — when it applies +(3+ specific warnings, not generic advice) + +Rules: +- Cite specific versions when behavior differs across versions (e.g., "vLLM ≥0.4.0 changed X") +- If the answer depends on hardware (A100 vs H100, PCIe vs NVLink), say so explicitly +- Flag anything that requires a paid tier, specific driver version, or non-default build diff --git a/cli/skills/verify.md b/cli/skills/verify.md new file mode 100644 index 0000000..12cd12a --- /dev/null +++ b/cli/skills/verify.md @@ -0,0 +1,23 @@ +You are a senior ML engineer reviewing code or config for correctness. Be exhaustive. + +Check for: +1. Deprecated APIs: `datetime.utcnow` → `datetime.now(timezone.utc)`, `declarative_base()` → `class Base(DeclarativeBase): pass`, `default=datetime.utcnow` → `default=lambda: datetime.now(timezone.utc)`, `onupdate=datetime.utcnow` → `onupdate=lambda: datetime.now(timezone.utc)` +2. Wrong config keys (character-for-character): `role-to-assume` not `role-to-arn`, `timeout-minutes` not `timeout` +3. Shape/dtype mismatches and off-by-one errors in seq_len, indices, or strides +4. Missing required fields, wrong defaults, or misconfigured training hyperparams +5. GPU memory issues: batch size × seq_len × hidden_dim × dtype_bytes × overhead > GPU VRAM + +Response format — ALL sections required: + +## Issues Found +For each issue: location (line/key), what's wrong, exact fix. +If no issues found: say so explicitly with a one-line justification per check. + +## Fixed Code +The corrected version if any issues were found. Omit if clean. + +## References +- [Source](URL) — what it covers + +## Pitfalls +Additional risk areas to watch in this kind of code (even if not present here). diff --git a/cli/superml b/cli/superml new file mode 100644 index 0000000..0ef3d91 --- /dev/null +++ b/cli/superml @@ -0,0 +1,139 @@ +#!/usr/bin/env bash +# superml - ML engineering CLI +# Usage: superml [--model haiku|sonnet|opus] [--ground] [--no-log] +set -euo pipefail + +SUPERML_DIR="$(cd "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")" && pwd)" +SKILLS_DIR="${SUPERML_DIR}/skills" + +# Use OAuth (claude.ai subscription) instead of API key billing +unset ANTHROPIC_API_KEY + +# Defaults +MODEL="${SUPERML_MODEL:-haiku}" +GROUND=false +NO_LOG=false +SKILL="" +TASK_PARTS=() + +# Parse args +while [[ $# -gt 0 ]]; do + case "$1" in + --model|-m) MODEL="$2"; shift 2 ;; + --ground|-g) GROUND=true; shift ;; + --no-log) NO_LOG=true; shift ;; + -*) echo "Unknown flag: $1" >&2; exit 1 ;; + *) + if [[ -z "$SKILL" ]]; then + SKILL="$1" + else + TASK_PARTS+=("$1") + fi + shift + ;; + esac +done + +TASK="${TASK_PARTS[*]:-}" + +if [[ -z "$SKILL" || -z "$TASK" ]]; then + cat >&2 <<'EOF' +superml - ML engineering CLI + +Usage: + superml [--model haiku|sonnet|opus] [--ground] [--no-log] + +Skills: + plan Plan an ML implementation + debug Debug an ML failure (OOM, NaN, crash, slow throughput) + research Research a framework or technique + verify Verify code/config for bugs and deprecated APIs + iterate Get ranked next steps after an experiment + experiment Design a reproducible experiment + +Flags: + --ground, -g Fetch official docs before responding (slower, cites sources) + --model, -m Model: haiku (default), sonnet, opus + --no-log Skip writing to experiments/journal.md this call + +Examples: + superml plan "fine-tune Llama 3.1 8B with QLoRA on 1xA100" + superml plan --ground "fine-tune Llama 3.1 8B with QLoRA on 1xA100" + superml debug "CUDA OOM during forward pass, batch_size=4, 80GB GPU" + superml research --ground "how does vLLM chunked prefill work" + superml verify "$(cat train_config.yaml)" + superml iterate "tried rank-8 LoRA, loss 0.35, not converging after 2k steps" + superml experiment "compare LoRA rank 8 vs 16 on MMLU 5-shot" + +Env: + SUPERML_MODEL=haiku|sonnet|opus override default model (default: haiku) +EOF + exit 1 +fi + +SKILL_FILE="${SKILLS_DIR}/${SKILL}.md" +if [[ ! -f "$SKILL_FILE" ]]; then + echo "Unknown skill '${SKILL}'. Available: plan debug research verify iterate experiment" >&2 + exit 1 +fi + +# Build system prompt +SYSTEM_PROMPT="$(cat "$SKILL_FILE")" +if [[ "$GROUND" == "true" ]]; then + SYSTEM_PROMPT="${SYSTEM_PROMPT}$(cat "${SUPERML_DIR}/grounding.md")" +fi + +# Inject persistent project context (bash reads files — automatic, no Claude instruction needed) +CONTEXT="" +if [[ -f "./MEMORY.md" ]]; then + CONTEXT="${CONTEXT} +## Hardware & Setup (MEMORY.md) +$(cat ./MEMORY.md) +" +fi +if [[ -f "./experiments/lessons.md" ]]; then + CONTEXT="${CONTEXT} +## Hard-won Rules (experiments/lessons.md) +$(cat ./experiments/lessons.md) +" +fi +if [[ -f "./experiments/journal.md" ]]; then + CONTEXT="${CONTEXT} +## Recent Experiments (experiments/journal.md — last 150 lines) +$(tail -150 ./experiments/journal.md) +" +fi + +if [[ -n "$CONTEXT" ]]; then + SYSTEM_PROMPT="${SYSTEM_PROMPT} + +--- +## Project Context + +${CONTEXT}" +fi + +# Run claude, stream to terminal, capture output for journal +RESPONSE_FILE="$(mktemp)" +set +e +claude -p \ + --settings "${SUPERML_DIR}/settings.json" \ + --system-prompt "$SYSTEM_PROMPT" \ + --model "$MODEL" \ + "$TASK" | tee "$RESPONSE_FILE" +EXIT_CODE="${PIPESTATUS[0]}" +set -e + +# Append to journal on success (bash writes — automatic regardless of skill used) +if [[ "$NO_LOG" == "false" && "$EXIT_CODE" -eq 0 ]]; then + mkdir -p ./experiments + { + printf '\n### %s — %s\n' "$(date '+%Y-%m-%d %H:%M')" "$SKILL" + printf '**Task**: %s\n\n' "$TASK" + cat "$RESPONSE_FILE" + printf '\n\n---\n' + } >> "./experiments/journal.md" +fi + +rm -f "$RESPONSE_FILE" +exit "$EXIT_CODE"