Skip to content

ParallelScience/cmbemu-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cmbemu-v2

Scientist: denario-6 Date: 2026-04-21

CMB Power Spectrum Emulator — Dataset Description (v2)

Overview

Train a neural-network emulator that maps 6 ΛCDM cosmological parameters to four CMB angular power spectra (TT, TE, EE, φφ). The emulator is scored on precision (likelihood-level accuracy) and CPU inference speed via cmbemu.get_score(emulator).

Package and Data Access

The cmbemu package is installed in the Denario venv and must be imported as:

import sys
sys.path.insert(0, '/home/node/work/cmbemu/src/')
import cmbemu as cec

Training data (50,000 cosmologies, pre-cached — no download needed):

train = cec.load_train()
# Returns a dict with keys: params, tt, te, ee, pp, box_lo, box_hi, param_names, lmax_cmb, lmax_pp

Cached file location (use directly with np.load if needed):

  • /home/node/.cache/cmbemu/datasets--borisbolliet--cmbemu-competition-v1/snapshots/e45fd9a3f451038e3ce677a56f7f6cb81c8c47c9/train.npz

Dataset Shapes and Units

  • train['params']: float32, shape (50000, 6) — cosmological parameters in physical units
  • train['tt']: float32, shape (50000, 6001) — C_ℓ^TT, ℓ = 0..6000, units K²
  • train['te']: float32, shape (50000, 6001) — C_ℓ^TE
  • train['ee']: float32, shape (50000, 6001) — C_ℓ^EE
  • train['pp']: float32, shape (50000, 3001) — C_ℓ^φφ, ℓ = 0..3000
  • train['box_lo']: float32, shape (6,) — parameter box lower bounds
  • train['box_hi']: float32, shape (6,) — parameter box upper bounds
  • train['param_names']: ('omega_b', 'omega_cdm', 'H0', 'tau_reio', 'ln10^{10}A_s', 'n_s')

Concrete parameter ranges:

  • omega_b: [0.020, 0.025]
  • omega_cdm: [0.090, 0.150]
  • H0: [55.0, 85.0]
  • tau_reio: [0.030, 0.100]
  • ln10^{10}A_s: [2.700, 3.300]
  • n_s: [0.920, 1.020]

ℓ=0 and ℓ=1 entries are present but ignored by the scorer. Indices 2..6000 matter for CMB; 2..3000 for φφ.

Emulator Interface

class MyEmulator:
    def predict(self, params: dict) -> dict:
        # params: {'omega_b': float, 'omega_cdm': float, 'H0': float,
        #          'tau_reio': float, 'ln10^{10}A_s': float, 'n_s': float}
        # returns: {'tt': np.ndarray (6001,), 'te': np.ndarray (6001,),
        #           'ee': np.ndarray (6001,), 'pp': np.ndarray (3001,)}
        ...

Scoring

acc = cec.get_accuracy_score(emu)   # precision only, fast
tim = cec.get_time_score(emu)       # CPU timing (single-threaded)
full = cec.get_score(emu)           # combined_S = log10(mae_total) + log10(max(t_cpu_ms, 1.0))

Baseline (ConstantPlanck): mae_total ≈ 1.13×10⁷. Any real emulator should achieve mae_total << 10⁶.

═══════════════════════════════════════════════

MANDATORY: GPU SETUP FOR TRAINING

═══════════════════════════════════════════════

Every training script MUST begin with the following lines — BEFORE any other imports — to force JAX onto the GPU:

import os
os.environ['JAX_PLATFORMS'] = 'cuda'  # MUST be before importing jax
import jax
# Verify GPU is active:
assert len([d for d in jax.devices() if 'cuda' in str(d).lower()]) > 0, \
    f"GPU not found, devices = {jax.devices()}"
print("Training on:", jax.devices())

This snippet has been tested and confirmed to work in this environment:

  • Hardware: 1× NVIDIA RTX PRO 6000 Blackwell, 96 GB VRAM, CUDA 13.0
  • JAX 0.10.0 with jax-cuda12-plugin 0.10.0 installed
  • Expected output: Training on: [CudaDevice(id=0)]

The env var must be set BEFORE import jax. Setting it after will have no effect. Do not use jax.config.update('jax_platform_name', 'gpu') — it is deprecated. Use only the os.environ approach above.

For scoring/timing, JAX must be pinned to CPU (scorer requirement):

os.environ['JAX_PLATFORMS'] = 'cpu'

Set this at the top of any script that calls cec.get_time_score or cec.get_score. The accuracy scorer (cec.get_accuracy_score) can run on GPU.

═══════════════════════════════════════════════

RECOMMENDED APPROACH: DIRECT LOG-Dℓ EMULATION

═══════════════════════════════════════════════

The recommended approach is to emulate Dℓ = ℓ(ℓ+1)Cℓ/(2π) in log space. This is simpler, numerically stable, and avoids complex inverse-transform bugs.

Step 1: Preprocessing (do once, save to disk)

import numpy as np
train = cec.load_train()

ell_cmb = np.arange(6001)
ell_pp  = np.arange(3001)
factor_cmb = np.where(ell_cmb >= 2, ell_cmb*(ell_cmb+1)/(2*np.pi), 1.0)
factor_pp  = np.where(ell_pp  >= 2, ell_pp *(ell_pp +1)/(2*np.pi), 1.0)

# Convert Cℓ → Dℓ (ℓ ≥ 2 only; ℓ=0,1 are zero/irrelevant)
D_tt = train['tt'] * factor_cmb   # shape (N, 6001)
D_te = train['te'] * factor_cmb
D_ee = train['ee'] * factor_cmb
D_pp = train['pp'] * factor_pp    # shape (N, 3001)

# Log-transform (all Dℓ spectra are positive for ℓ ≥ 2, except TE which can be negative)
# For TT, EE, PP: take log. For TE: keep linear or use asinh.
log_D_tt = np.log(D_tt[:, 2:])   # shape (N, 5999)
log_D_ee = np.log(D_ee[:, 2:])
log_D_pp = np.log(D_pp[:, 2:])   # shape (N, 2999)
D_te_lin = D_te[:, 2:]           # shape (N, 5999) — keep linear for TE

# Concatenate into flat target vector
targets_raw = np.concatenate([log_D_tt, D_te_lin, log_D_ee, log_D_pp], axis=1)
# Shape: (N, 5999 + 5999 + 5999 + 2999) = (N, 20996)

# Normalize inputs: min-max scale to [0, 1]
box_lo = train['box_lo']  # shape (6,)
box_hi = train['box_hi']
X = (train['params'] - box_lo) / (box_hi - box_lo)   # shape (N, 6), range [0,1]

# Standardize targets: zero mean, unit variance
target_mean = targets_raw.mean(axis=0)   # shape (20996,)
target_std  = targets_raw.std(axis=0)
target_std  = np.where(target_std < 1e-8, 1e-8, target_std)
Y = (targets_raw - target_mean) / target_std   # shape (N, 20996)

Step 2: The predict() method — CRITICAL

The predict() method must invert ALL preprocessing transforms. This is where bugs typically occur. Here is the exact inverse:

PARAM_ORDER = ('omega_b', 'omega_cdm', 'H0', 'tau_reio', 'ln10^{10}A_s', 'n_s')

def predict(self, params_dict: dict) -> dict:
    # 1. Scale input parameters to [0, 1]
    x = np.array([params_dict[k] for k in PARAM_ORDER], dtype=np.float32)
    x_scaled = (x - self.box_lo) / (self.box_hi - self.box_lo)  # shape (6,)

    # 2. Run neural network → standardized log-Dℓ predictions
    y_std = self._nn_predict(x_scaled)   # shape (20996,)

    # 3. Undo standardization
    y_raw = y_std * self.target_std + self.target_mean   # shape (20996,)

    # 4. Unpack the 4 segments
    log_D_tt = y_raw[0:5999]
    D_te_lin = y_raw[5999:11998]
    log_D_ee = y_raw[11998:17997]
    log_D_pp = y_raw[17997:20996]

    # 5. Undo log transform
    D_tt = np.exp(log_D_tt)   # shape (5999,)
    D_ee = np.exp(log_D_ee)
    D_pp = np.exp(log_D_pp)   # shape (2999,)
    D_te = D_te_lin            # already linear

    # 6. Convert Dℓ back to Cℓ: Cℓ = Dℓ / (ℓ(ℓ+1)/(2π))
    ell_cmb = np.arange(2, 6001)
    ell_pp  = np.arange(2, 3001)
    factor_cmb = ell_cmb * (ell_cmb + 1) / (2 * np.pi)
    factor_pp  = ell_pp  * (ell_pp  + 1) / (2 * np.pi)

    C_tt = D_tt / factor_cmb   # shape (5999,)
    C_te = D_te / factor_cmb
    C_ee = D_ee / factor_cmb
    C_pp = D_pp / factor_pp    # shape (2999,)

    # 7. Pad ℓ=0,1 with zeros to restore full shape
    pad2 = np.zeros(2, dtype=np.float32)
    return {
        'tt': np.concatenate([pad2, C_tt]).astype(np.float32),  # shape (6001,)
        'te': np.concatenate([pad2, C_te]).astype(np.float32),
        'ee': np.concatenate([pad2, C_ee]).astype(np.float32),
        'pp': np.concatenate([np.zeros(2, dtype=np.float32), C_pp]).astype(np.float32),  # shape (3001,)
    }

ALWAYS verify the predict() method is correct before training by running:

# Sanity check: predict a training point and compare to ground truth
i = 0
params_dict = {k: float(train['params'][i, j]) for j, k in enumerate(PARAM_ORDER)}
pred = emu.predict(params_dict)
truth_tt = train['tt'][i]  # shape (6001,)
rel_err_tt = np.abs(pred['tt'][2:] - truth_tt[2:]) / (np.abs(truth_tt[2:]) + 1e-30)
print("Max relative TT error on training point:", rel_err_tt.max())
# For a well-trained model this should be << 1. For a freshly initialized model it will be large.
# If it's exactly identical to ConstantPlanck output, the inverse transform is wrong.

Architecture Recommendations

  • MLP with 6→[512→512→512→512]→20996 (residual connections, LayerNorm, GELU)
  • Or: MLP encoder (6→256) + separate heads for TT/TE/EE (each 256→5999) and PP (256→2999)
  • Use mini-batches of 1024–4096 (GPU memory allows it)
  • Adam/AdamW with cosine LR schedule; ~200–500 epochs
  • Monitor cec.get_accuracy_score every 50 epochs on a held-out 5k validation set

Additional Data Generation

# Generate 50k extra cosmologies (takes ~minutes on this machine)
extra = cec.generate_data(n=50000, seed=99)
# Returns same dict format as load_train()

Hardware

  • 1× NVIDIA RTX PRO 6000 Blackwell, 96 GB VRAM
  • 64 CPU cores, 128 GB RAM
  • CUDA 13.0, JAX 0.10.0 + jax-cuda12-plugin
  • Use batch sizes of 4096–16384 to fully utilise GPU throughput

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors