Fast kernel bandwidth selection via analytic Hessian Newton optimization. 16× faster for KDE, 25-49× faster for NW regression at n≥10,000.
pip install hbwimport numpy as np
from hbw import kde_bandwidth, nw_bandwidth
# KDE bandwidth selection
x = np.random.randn(1000)
h = kde_bandwidth(x)
print(f"Optimal KDE bandwidth: {h:.4f}")
# Nadaraya-Watson regression bandwidth
x = np.linspace(-2, 2, 500)
y = np.sin(2 * x) + 0.3 * np.random.randn(len(x))
h = nw_bandwidth(x, y)
print(f"Optimal NW bandwidth: {h:.4f}")
# Large datasets: automatic subsampling
x_large = np.random.randn(100_000)
h = kde_bandwidth(x_large, max_n=5000, seed=42) # Uses 5000 random points
# Multivariate KDE (2D example)
from hbw import kde_bandwidth_mv
X = np.random.randn(500, 2)
h = kde_bandwidth_mv(X)
print(f"Optimal 2D KDE bandwidth: {h:.4f}")
# Multivariate NW regression (2D predictors)
from hbw import nw_bandwidth_mv
X = np.random.randn(500, 2)
y = np.sin(X[:, 0]) + 0.5 * X[:, 1] + 0.3 * np.random.randn(500)
h = nw_bandwidth_mv(X, y)
print(f"Optimal 2D NW bandwidth: {h:.4f}")Silverman's rule-of-thumb assumes your data is unimodal and roughly Gaussian. Use hbw when:
- Multimodal distributions: Silverman oversmooths multiple peaks into a single blob. LSCV adapts to reveal distinct modes.
- Non-Gaussian data: Heavy tails or skewness cause Silverman to choose suboptimal bandwidths. Cross-validation optimizes for your actual data shape.
- Regression (Nadaraya-Watson): Silverman's rule is designed for density estimation, not the x-y relationship in regression. NW bandwidth selection requires data-driven LOOCV.
- Bootstrap/uncertainty quantification: Each resample has different structure; rule-of-thumb bandwidths don't adapt. CV selection per resample is critical—and with Newton optimization, now practical.
Select optimal KDE bandwidth via LSCV minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Sample data |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data (None to disable) |
seed |
int | Random seed for reproducible subsampling |
Returns: float - optimal bandwidth
Select optimal Nadaraya-Watson bandwidth via LOOCV-MSE minimization.
| Parameter | Type | Description |
|---|---|---|
x |
array-like | Predictor values |
y |
array-like | Response values |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Silverman's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
Returns: float - optimal bandwidth
Compute LSCV score, gradient, and Hessian for KDE.
Returns: tuple[float, float, float] - (score, gradient, hessian)
Compute LOOCV-MSE, gradient, and Hessian for NW regression.
Returns: tuple[float, float, float] - (loss, gradient, hessian)
Select optimal multivariate KDE bandwidth via LSCV minimization with product kernel.
| Parameter | Type | Description |
|---|---|---|
data |
array-like | Sample data, shape (n, d) |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Scott's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
standardize |
bool | Standardize each dimension to unit variance |
Returns: float - optimal isotropic bandwidth
Compute LSCV score, gradient, and Hessian for multivariate KDE.
Returns: tuple[float, float, float] - (score, gradient, hessian)
Select optimal multivariate NW bandwidth via LOOCV-MSE minimization with product kernel.
| Parameter | Type | Description |
|---|---|---|
data |
array-like | Predictor values, shape (n, d) |
y |
array-like | Response values |
kernel |
str | "gauss", "epan", "unif", "biweight", "triweight", or "cosine" |
h0 |
float | Initial bandwidth (default: Scott's rule) |
max_n |
int | Subsample size for large data |
seed |
int | Random seed |
standardize |
bool | Standardize each predictor dimension to unit variance |
Returns: float - optimal isotropic bandwidth
Compute LOOCV-MSE, gradient, and Hessian for multivariate NW regression.
Returns: tuple[float, float, float] - (loss, gradient, hessian)
Problem: Cross-validation bandwidth selection requires O(n²) per evaluation. Grid search needs 50-100 evaluations.
Solution: We derive closed-form gradients and Hessians for the LSCV (KDE) and LOOCV-MSE (NW) objectives. Newton optimization converges in 6-12 iterations, but the key insight is that each iteration shares O(n²) pairwise computations across objective, gradient, and Hessian—yielding speedups that grow with sample size (16× for KDE, 25-49× for NW at n≥10,000).
Supported kernels:
- Gaussian:
K(u) = exp(-u²/2) / √(2π) - Epanechnikov:
K(u) = 0.75(1-u²)for |u| ≤ 1 - Uniform:
K(u) = 0.5for |u| ≤ 1 - Biweight:
K(u) = (15/16)(1-u²)²for |u| ≤ 1 - Triweight:
K(u) = (35/32)(1-u²)³for |u| ≤ 1 - Cosine:
K(u) = (π/4)cos(πu/2)for |u| ≤ 1
For full mathematical details, see the paper.
Newton-Armijo with analytic Hessian achieves identical accuracy to grid search with speedups that grow with sample size. All implementations use Numba with parallel execution.
Speedup vs. Grid Search (Gaussian kernel):
| n | KDE Speedup | NW Speedup |
|---|---|---|
| 100 | 0.3× | 1.1× |
| 500 | 4.2× | 7.8× |
| 2,000 | 7.9× | 28× |
| 5,000 | 8.7× | 26× |
| 10,000 | 16× | 25× |
| 20,000 | — | 49× |
Bootstrap use case: 200 resamples at n=10,000 takes ~100 minutes with grid search vs. ~4 minutes with Newton.
Tested across sample sizes, noise levels, four DGPs (bimodal, unimodal, skewed, heavy-tailed), and all six kernels. See ms/ for full details.
- Multivariate KDE/NW: Isotropic bandwidth only (same h in all dimensions)
- Not supported: Anisotropic bandwidth (dimension-specific bandwidths), local/adaptive bandwidth selection
@misc{hbw2025,
author = {Sood, Gaurav},
title = {Analytic-Hessian Bandwidth Selection for Kernel Density Estimation and Nadaraya-Watson Regression},
year = {2025},
url = {https://github.com/finite-sample/hbw}
}MIT