diff --git a/skills/pyem-model-generator/README.md b/skills/pyem-model-generator/README.md new file mode 100644 index 0000000..9ed74c5 --- /dev/null +++ b/skills/pyem-model-generator/README.md @@ -0,0 +1,87 @@ +# pyem-model-generator skill + +Use this skill to generate standalone computational cognitive model code and a matching parameter-recovery notebook. + +## What this skill generates + +Given a task/model description, the skill generates files in a **single directory**: + +- `{modclass}_utils.py` +- `{model_name}.py` +- `{model_name}.ipynb` + +The generated model file follows a consistent contract: + +- attributes: `mod_desc`, `mod_spec`, `mod_id`, `MODEL` +- functions: `mod_params`, `mod_sim`, `mod_fit` + +## Required references bundled with the skill + +- `references/rl.json` +- `references/bayes.json` +- `references/glm.json` +- `references/modelclass-utils-template.py` +- `references/model-file-template.py` +- `references/example-notebook-template.json` +- `references/parameter-recovery-notebook.md` +- `references/pyem-runtime-contract.md` + +## Quick start (first-time users) + +1. Describe your task and model in plain language (or equations). +2. Ask the skill to generate: + - `{modclass}_utils.py` + - `{model_name}.py` + - `{model_name}.ipynb` +3. If details are missing, answer the skill’s follow-up questions. +4. Review generated files and run your analysis workflow. + +## Notes on generated files + +### Shared utils file + +`{modclass}_utils.py` should define shared helpers used across model files: + +- `_alloc_sim`, `_alloc_fit` +- `ModelSpec`, `ParamDef` +- `spec_to_id`, `build_params` +- `PARAM_REGISTRY` + +### Model file + +`{model_name}.py` imports math helpers from pyEM: + +```python +from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval +``` + +And imports shared helpers from: + +```python +from {modclass}_utils import _alloc_sim, _alloc_fit, ModelSpec, spec_to_id, build_params +``` + +### Notebook file + +The notebook template uses: + +```python +from pyem.api import EMModel +``` + +and follows a simulation → fit → recovery plot workflow. + +## Example prompt + +```text +Use pyem-model-generator. +Generate standalone files in one directory: +- social_utils.py +- social_rw.py +- social_rw.ipynb + +Task: three-option social learning task with 4 blocks x 12 trials and 100 agents. +Model: dual-value update equations for self and other values with softmax choice. +Include parameter recovery plots in the notebook. +Ask follow-up questions before generation if any details are ambiguous. +``` diff --git a/skills/pyem-model-generator/SKILL.md b/skills/pyem-model-generator/SKILL.md new file mode 100644 index 0000000..60e0e4a --- /dev/null +++ b/skills/pyem-model-generator/SKILL.md @@ -0,0 +1,96 @@ +--- +name: pyem-model-generator +description: Generate standalone computational cognitive model modules and example notebooks from free-text or reference specs, using a shared `modclass_utils.py` contract and per-model files with `mod_desc`, `mod_spec`, `mod_id`, `MODEL`, `mod_params`, `mod_sim`, and `mod_fit`. +--- + +# pyem-model-generator + +Generate all outputs into the **current working directory** (flat layout). + +## Required local references + +- `references/rl.json` +- `references/bayes.json` +- `references/glm.json` +- `references/modelclass-utils-template.py` +- `references/model-file-template.py` +- `references/example-notebook-template.json` +- `references/parameter-recovery-notebook.md` +- `references/pyem-runtime-contract.md` + +Do not require repository path conventions like `pyem/models/...` or `examples/...`. + +## Output layout (flat) + +Write files in one directory: + +- `{modclass}_utils.py` +- `{model_name}.py` (one or more model files) +- `{model_name}.ipynb` (or one notebook per model class) + +## Clarification behavior + +If required details are missing, ask concise follow-up questions before generation: + +1. Task structure (`nsubjects`, `nblocks`, `ntrials`, choices, outcomes). +2. Parameter names/transforms/bounds/priors. +3. Equations (state update and choice rule). +4. Variant list and naming. +5. Desired output filenames. + +## Free-text parsing workflow + +When given prose/equations: + +1. Extract task flow, tensors, equations, and variants. +2. Normalize symbol names to valid Python variables. +3. Preserve equation intent in `mod_sim`/`mod_fit`. +4. Resolve ambiguities via targeted questions. + +## Shared utility heuristic (required) + +Create one shared `{modclass}_utils.py` file containing only: + +- `_alloc_sim` +- `_alloc_fit` +- `ModelSpec` +- `ParamDef` +- `spec_to_id` +- `build_params` +- `PARAM_REGISTRY` + +Each `{model_name}.py` should import shared helpers with: + +```python +from {modclass}_utils import _alloc_sim, _alloc_fit, ModelSpec, spec_to_id, build_params +``` + +## Per-model file contract + +Each generated `{model_name}.py` must include: + +- attributes: `mod_desc`, `mod_spec`, `mod_id`, `MODEL` +- functions: `mod_params`, `mod_sim`, `mod_fit` + +Each model file should import math helpers directly from pyem: + +```python +from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval +``` + +## Notebook requirements + +Generate notebook from `references/example-notebook-template.json` and ensure it imports: + +```python +from pyem.api import EMModel +``` + +Do not use `from scipy.optimize import minimize` in generated notebooks. + +## Generation steps + +1. Select the closest anchor from `references/rl.json`, `references/bayes.json`, `references/glm.json`. +2. Generate `modclass_utils.py` from `references/modelclass-utils-template.py`. +3. Generate each `{model_name}.py` from `references/model-file-template.py`. +4. Generate notebook(s) from `references/example-notebook-template.json` and `references/parameter-recovery-notebook.md`. diff --git a/skills/pyem-model-generator/references/bayes.json b/skills/pyem-model-generator/references/bayes.json new file mode 100644 index 0000000..3a048a7 --- /dev/null +++ b/skills/pyem-model-generator/references/bayes.json @@ -0,0 +1,18 @@ +{ + "model_class": "bayes", + "utils_file": "modclass_utils.py", + "models": [ + { + "model_name": "bayes_fish", + "model_file": "bayes_fish.py", + "notebook_file": "bayes_fish.ipynb", + "required_attributes": ["mod_desc", "mod_spec", "mod_id", "MODEL"], + "required_functions": ["mod_params", "mod_sim", "mod_fit"], + "shared_import": "from modclass_utils import _alloc_sim, _alloc_fit, ModelSpec, spec_to_id, build_params", + "math_import": "from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval", + "parameters": ["beta", "lambda1"], + "sim_outputs": ["params", "choices", "observations", "posterior", "nll"], + "fit_outputs": ["npl", "nll", "all"] + } + ] +} diff --git a/skills/pyem-model-generator/references/example-notebook-template.json b/skills/pyem-model-generator/references/example-notebook-template.json new file mode 100644 index 0000000..f87cf46 --- /dev/null +++ b/skills/pyem-model-generator/references/example-notebook-template.json @@ -0,0 +1,73 @@ +{ + "nbformat": 4, + "nbformat_minor": 5, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "cell_templates": [ + { + "cell_type": "markdown", + "source": [ + "# {model_title}\\n", + "\\n", + "## {task_title}\\n", + "This notebook demonstrates simulation, fitting, and parameter recovery." + ] + }, + { + "cell_type": "code", + "source": [ + "import importlib\\n", + "import numpy as np\\n", + "import matplotlib.pyplot as plt\\n", + "from pyem.api import EMModel" + ] + }, + { + "cell_type": "code", + "source": [ + "script_fn = \"{model_name}\"\\n", + "nsubj = {nsubjects}\\n", + "nblocks = {nblocks}\\n", + "ntrials = {ntrials}\\n", + "module = importlib.import_module(f\"{script_fn}\")\\n", + "MODEL = module.MODEL\\n", + "print(script_fn, end=\"\\n\")\\n", + "print(MODEL.id, end=\"\\n\\n\")\\n", + "print(MODEL.desc, end=\"\\n\\n\")\\n", + "mod_params, mod_sim, mod_fit = MODEL.params, MODEL.sim, MODEL.fit\\n", + "param_names, param_xform, true_params = mod_params(nsubj)" + ] + }, + { + "cell_type": "code", + "source": [ + "sim_outp = mod_sim(true_params, nblocks=nblocks, ntrials=ntrials)\\n", + "sim_data = [[sim_outp['choices'][i, ...], sim_outp['rewards'][i, ...]] for i in range(nsubj)]\\n", + "len(sim_data)" + ] + }, + { + "cell_type": "code", + "source": [ + "model = EMModel(all_data=sim_data, fit_func=mod_fit, param_names=param_names, param_xform=param_xform)\\n", + "result = model.fit(verbose=1)\\n", + "result" + ] + }, + { + "cell_type": "code", + "source": [ + "fig = model.plot_recovery({'true_params': true_params, 'estimated_params': model.outfit['params']})\\n", + "fig" + ] + } + ] +} diff --git a/skills/pyem-model-generator/references/glm.json b/skills/pyem-model-generator/references/glm.json new file mode 100644 index 0000000..d0ce16b --- /dev/null +++ b/skills/pyem-model-generator/references/glm.json @@ -0,0 +1,18 @@ +{ + "model_class": "glm", + "utils_file": "modclass_utils.py", + "models": [ + { + "model_name": "glm_linear", + "model_file": "glm_linear.py", + "notebook_file": "glm_linear.ipynb", + "required_attributes": ["mod_desc", "mod_spec", "mod_id", "MODEL"], + "required_functions": ["mod_params", "mod_sim", "mod_fit"], + "shared_import": "from modclass_utils import _alloc_sim, _alloc_fit, ModelSpec, spec_to_id, build_params", + "math_import": "from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval", + "parameters": ["w0", "w1", "sigma"], + "sim_outputs": ["params", "X", "y", "pred", "nll"], + "fit_outputs": ["npl", "nll", "all"] + } + ] +} diff --git a/skills/pyem-model-generator/references/model-file-template.py b/skills/pyem-model-generator/references/model-file-template.py new file mode 100644 index 0000000..a19d7e4 --- /dev/null +++ b/skills/pyem-model-generator/references/model-file-template.py @@ -0,0 +1,92 @@ +"""Template for one generated model module.""" + +from __future__ import annotations + +import numpy as np + +from pyem.utils.math import calc_fval, norm2alpha, norm2beta, softmax +from modclass_utils import ( + ModelSpec, + _alloc_fit, + _alloc_sim, + build_params, + spec_to_id, +) + + +mod_desc = """Replace with concise model description.""" +mod_spec = {"rl": {"softmax": ["beta"], "rw": ["alpha"]}} +mod_id = spec_to_id(mod_spec) + + +def mod_params(nsubj: int, rng: np.random.Generator | None = None): + """Generate parameter names, transforms, and true parameters.""" + return build_params(["beta", "alpha"], nsubj, rng) + + +def mod_sim(params: np.ndarray, nblocks: int = 4, ntrials: int = 12, **kwargs): + """Simulate behavior for this model variant.""" + nsubj = params.shape[0] + dat = _alloc_sim(nsubj, nblocks, ntrials, nchoices=2) + rng = np.random.default_rng(kwargs.get("seed", None)) + + beta = params[:, 0] + alpha = params[:, 1] + + for s in range(nsubj): + for b in range(nblocks): + dat["ev"][s, b, 0, :] = 0.5 + for t in range(ntrials): + p = softmax(dat["ev"][s, b, t, :], beta[s]) + c = rng.choice([0, 1], p=p) + r = float(rng.integers(0, 2)) + dat["choices"][s, b, t] = "A" if c == 0 else "B" + dat["rewards"][s, b, t] = r + dat["ch_prob"][s, b, t, :] = p + dat["pe"][s, b, t] = r - dat["ev"][s, b, t, c] + dat["ev"][s, b, t + 1, :] = dat["ev"][s, b, t, :] + dat["ev"][s, b, t + 1, c] = ( + dat["ev"][s, b, t, c] + alpha[s] * dat["pe"][s, b, t] + ) + dat["nll"][s] += -np.log(p[c] + 1e-12) + + dat["params"] = params + return dat + + +def mod_fit(params, choices, rewards, prior=None, output="npl"): + """Fit objective (npl/nll) with optional diagnostics.""" + beta = float(norm2beta(params[0])) + alpha = float(norm2alpha(params[1])) + + if not (1e-5 <= beta <= 20.0) or not (0.0 <= alpha <= 1.0): + return 1e7 + + nblocks, ntrials = rewards.shape + dat = _alloc_fit(nblocks, ntrials, nchoices=2) + + for b in range(nblocks): + dat["ev"][b, 0, :] = 0.5 + for t in range(ntrials): + c = 0 if choices[b, t] == "A" else 1 + p = softmax(dat["ev"][b, t, :], beta) + r = rewards[b, t] + dat["ch_prob"][b, t, :] = p + dat["pe"][b, t] = r - dat["ev"][b, t, c] + dat["ev"][b, t + 1, :] = dat["ev"][b, t, :] + dat["ev"][b, t + 1, c] = dat["ev"][b, t, c] + alpha * dat["pe"][b, t] + dat["nll"] += -np.log(p[c] + 1e-12) + + if output == "all": + return {"params": [beta, alpha], **dat} + return calc_fval(dat["nll"], np.asarray(params), prior=prior, output=output) + + +MODEL = ModelSpec( + id=mod_id, + spec=mod_spec, + desc=mod_desc, + params=mod_params, + sim=mod_sim, + fit=mod_fit, +) diff --git a/skills/pyem-model-generator/references/modelclass-utils-template.py b/skills/pyem-model-generator/references/modelclass-utils-template.py new file mode 100644 index 0000000..82f0ffc --- /dev/null +++ b/skills/pyem-model-generator/references/modelclass-utils-template.py @@ -0,0 +1,138 @@ +"""Shared utilities for generated model files. + +Keep this file lightweight and shared across all generated model modules. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Callable, Dict, Sequence + +import numpy as np + + +@dataclass(frozen=True) +class ModelSpec: + """Container for one generated model variant.""" + + id: str + spec: dict + desc: str + params: Callable + sim: Callable + fit: Callable + + +@dataclass(frozen=True) +class ParamDef: + """Definition for one parameter in the registry.""" + + name: str + xform: Callable + init_fn: Callable + + +def spec_to_id(spec: dict) -> str: + """Convert a nested spec dictionary into a deterministic ID string.""" + block_order = ["rl", "cr", "link"] + op_alias = {"linear": "lin"} + + blocks = [] + for block in block_order: + if block not in spec or not spec[block]: + continue + ops = spec[block] + op_strs = [] + for op_name in sorted(ops.keys()): + args = ops[op_name] + name = op_alias.get(op_name, op_name) + if isinstance(args, dict): + for subop in sorted(args.keys()): + subargs = args[subop] + if not isinstance(subargs, (list, tuple)): + raise ValueError( + f"Arguments for {block}:{op_name}:{subop} must be a list" + ) + op_strs.append(f"{name}.{subop}({','.join(subargs)})") + elif isinstance(args, (list, tuple)): + op_strs.append(f"{name}({','.join(args)})") + else: + raise ValueError( + f"Arguments for {block}:{op_name} must be a list or dict" + ) + blocks.append(f"{block}={'/'.join(op_strs)}") + return "|".join(blocks) + + +def _alloc_sim(nsubj: int, nblocks: int, ntrials: int, nchoices: int = 2) -> Dict[str, np.ndarray]: + """Allocate common simulation arrays.""" + return { + "choices": np.zeros((nsubj, nblocks, ntrials), dtype=object), + "rewards": np.zeros((nsubj, nblocks, ntrials), dtype=float), + "ev": np.zeros((nsubj, nblocks, ntrials + 1, nchoices), dtype=float), + "ch_prob": np.zeros((nsubj, nblocks, ntrials, nchoices), dtype=float), + "pe": np.zeros((nsubj, nblocks, ntrials), dtype=float), + "nll": np.zeros((nsubj,), dtype=float), + } + + +def _alloc_fit(nblocks: int, ntrials: int, nchoices: int = 2) -> Dict[str, np.ndarray]: + """Allocate common fitting arrays.""" + return { + "ev": np.zeros((nblocks, ntrials + 1, nchoices), dtype=float), + "ch_prob": np.zeros((nblocks, ntrials, nchoices), dtype=float), + "pe": np.zeros((nblocks, ntrials), dtype=float), + "nll": 0.0, + } + + +def _norm2unit(x): + """Map unconstrained Gaussian-space values to the open unit interval.""" + x = np.asarray(x, dtype=float) + return 1.0 / (1.0 + np.exp(-x)) + + +def _norm2pos(x): + """Map unconstrained Gaussian-space values to positive reals.""" + x = np.asarray(x, dtype=float) + return np.exp(x) + + +PARAM_REGISTRY = { + "beta": ParamDef("beta", _norm2pos, lambda rng, n: rng.uniform(0.5, 8.0, size=n)), + "alpha": ParamDef("alpha", _norm2unit, lambda rng, n: rng.uniform(0.1, 0.9, size=n)), + "lambda1": ParamDef("lambda1", _norm2unit, lambda rng, n: rng.uniform(0.1, 0.9, size=n)), + "w0": ParamDef("w0", lambda x: x, lambda rng, n: rng.normal(0.0, 1.0, size=n)), + "w1": ParamDef("w1", lambda x: x, lambda rng, n: rng.normal(0.0, 1.0, size=n)), + "sigma": ParamDef("sigma", _norm2pos, lambda rng, n: rng.uniform(0.1, 2.0, size=n)), +} + + +def build_params( + param_names: Sequence[str], + nsubj: int, + rng: np.random.Generator | None = None, +) -> tuple[list[str], list[Callable], np.ndarray]: + """Build parameter transforms and sampled true params. + + Returns a 3-tuple: + - param_names: list of parameter name strings + - param_xform: list of callables that map optimizer outputs (Gaussian / + unconstrained space) to natural-space parameter values; passed directly + to ``EMModel(param_xform=...)`` + - true_params: ``(nsubj, nparams)`` array of natural-space ground-truth + parameters generated by each ``ParamDef.init_fn``; used directly in + ``mod_sim`` and as the reference for recovery comparisons + """ + if rng is None: + rng = np.random.default_rng() + + true_params = np.zeros((nsubj, len(param_names)), dtype=float) + param_xform: list[Callable] = [] + + for i, name in enumerate(param_names): + p = PARAM_REGISTRY[name] + param_xform.append(p.xform) + true_params[:, i] = p.init_fn(rng, nsubj) + + return list(param_names), param_xform, true_params diff --git a/skills/pyem-model-generator/references/parameter-recovery-notebook.md b/skills/pyem-model-generator/references/parameter-recovery-notebook.md new file mode 100644 index 0000000..77efdf3 --- /dev/null +++ b/skills/pyem-model-generator/references/parameter-recovery-notebook.md @@ -0,0 +1,57 @@ +# Parameter recovery notebook pattern + +Use this reference to implement `{model_name}.ipynb` in a flat output directory, even when base example notebooks are unavailable. + +This pattern follows the bundled offline templates and anchor specs (`references/rl.json`, `references/bayes.json`, `references/glm.json`) so notebook generation does not require repository access: + +- Intro markdown title + task subtitle. +- Import block (`numpy`, plotting, model sim/fit, `EMModel`). +- Simulation setup cell. +- Simulation execution cell. +- Fit-and-recover cell using `EMModel.recover(...)`. +- Parameter recovery scatter plots with identity lines. + +## Required sections + +1. Model and task overview. +2. Parameter specification (true generating parameters). +3. Simulation run. +4. Fit simulated behavior. +5. Parameter recovery plot. +6. Brief interpretation. + +## Template source + +Use `references/example-notebook-template.json` as the base cell template. Replace all placeholders (for example `{model_name}`, bounds, and parameter names). + +## Minimal recovery workflow + +1. Choose `N` synthetic subjects (e.g., `N=50`). +2. Sample true parameters in natural space. +3. Run `{model_name}_sim` to generate behavior. +4. Fit each synthetic subject with `{model_name}_fit` via `EMModel.recover`. +5. Compare true vs recovered parameters. + +## Plot requirements + +- One subplot per parameter. +- X-axis: true values. +- Y-axis: recovered values. +- Add identity line `y=x`. +- Report Pearson correlation `r` in each panel title. + +## Minimal plotting snippet + +```python +fig, axes = plt.subplots(1, n_params, figsize=(4 * n_params, 4)) +for i, ax in enumerate(np.atleast_1d(axes)): + ax.scatter(true_params[:, i], recovered_params[:, i], alpha=0.7) + lo = min(true_params[:, i].min(), recovered_params[:, i].min()) + hi = max(true_params[:, i].max(), recovered_params[:, i].max()) + ax.plot([lo, hi], [lo, hi], "k--", linewidth=1) + r = np.corrcoef(true_params[:, i], recovered_params[:, i])[0, 1] + ax.set_title(f"{param_names[i]} (r={r:.2f})") + ax.set_xlabel("True") + ax.set_ylabel("Recovered") +plt.tight_layout() +``` diff --git a/skills/pyem-model-generator/references/pyem-runtime-contract.md b/skills/pyem-model-generator/references/pyem-runtime-contract.md new file mode 100644 index 0000000..86cb4c9 --- /dev/null +++ b/skills/pyem-model-generator/references/pyem-runtime-contract.md @@ -0,0 +1,43 @@ +# Runtime contract for generated models + +Generated model files should import math helpers directly from pyem: + +```python +from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval +``` + +The shared `modclass_utils.py` file should **not** define these math helpers. +It should only provide: + +- `_alloc_sim` +- `_alloc_fit` +- `ModelSpec` +- `ParamDef` +- `spec_to_id` +- `build_params` +- `PARAM_REGISTRY` + +## Function contracts + +## `mod_params(nsubj, rng=None)` + +- Returns `(param_names, param_xform, true_params)`. +- `true_params` shape: `(nsubj, nparams)`. + +## `mod_sim(params, ..., **kwargs)` + +- Returns a dictionary with stable keys appropriate to the model class. +- Common required outputs are `params` and `choices`; include `rewards` when the task/model uses reward feedback or when downstream fitting/diagnostics require it. +- Model-specific latent/diagnostic arrays such as `ev`, `pe`, and similar traces may be included, but no single latent key is required for all models. + +## `mod_fit(params, ..., prior=None, output="npl")` + +- Must support `output="npl"`, `"nll"`, and optionally `"all"`. +- Uses transformed parameters (`norm2alpha`, `norm2beta`) when constraints require. +- Returns large penalty (commonly `1e7`) for invalid parameter regions. +- Uses `calc_fval` for scalar objective outputs. + +## Prior handling + +- Prior can be `None` or a prior-like object that implements a `logpdf` method compatible with `calc_fval`. +- Pass the prior object through unchanged to `calc_fval`. diff --git a/skills/pyem-model-generator/references/rl.json b/skills/pyem-model-generator/references/rl.json new file mode 100644 index 0000000..48fb2a5 --- /dev/null +++ b/skills/pyem-model-generator/references/rl.json @@ -0,0 +1,18 @@ +{ + "model_class": "rl", + "utils_file": "modclass_utils.py", + "models": [ + { + "model_name": "rw1a1b", + "model_file": "rw1a1b.py", + "notebook_file": "rw1a1b.ipynb", + "required_attributes": ["mod_desc", "mod_spec", "mod_id", "MODEL"], + "required_functions": ["mod_params", "mod_sim", "mod_fit"], + "shared_import": "from modclass_utils import _alloc_sim, _alloc_fit, ModelSpec, spec_to_id, build_params", + "math_import": "from pyem.utils.math import norm2alpha, norm2beta, softmax, calc_fval", + "parameters": ["beta", "alpha"], + "sim_outputs": ["params", "choices", "rewards", "ev", "ch_prob", "pe", "nll"], + "fit_outputs": ["npl", "nll", "all"] + } + ] +}