Bayesian Adaptive Neural Decision Engine for eXploration & Action
Disclaimer: Bandexa is experimental, evolving software. Use at your own risk. It comes with no guarantees of any kind, including correctness, security or suitability for production. See the LICENSE and docs/privacy.md for more information.
Bandexa is a PyTorch-native contextual bandits library focused on Neural-Linear Thompson Sampling: learn a neural representation with backprop, while keeping uncertainty + exploration tractable via a Bayesian linear regression posterior on top of learned features.
A contextual bandit chooses an action
- Bootstrapped / ensemble methods (sample a model/head and act greedily)
- Dropout-as-approximate Bayesian inference (use dropout at inference as uncertainty proxy)
- Neural Thompson Sampling variants (TS-style exploration using neural tangent kernel)
- Neural-Linear (neural representation + Bayesian linear head for closed-form posterior sampling)
Bandexa’s core emphasis is the Neural-Linear family.
Neural-Linear TS splits the problem into:
-
Representation learning (neural encoder): learn features
$z(x,a)$ via SGD / backprop. - Uncertainty + exploration (Bayesian linear regression head): keep a conjugate BLR posterior over linear weights for TS, enabling closed-form updates and efficient sampling.
This design is popular because it preserves efficient, well-behaved uncertainty in the final layer while letting the encoder model complex nonlinear structure. It’s also widely studied and used as a strong baseline in deep contextual bandits literature.
Bandexa is intentionally PyTorch-first: tensors, modules, training loops, and batching are written in idiomatic PyTorch. The primary goal is a clean, reliable implementation for PyTorch workflows (CPU/GPU).
At a high level you:
-
Define an Encoder
Ethat can produce features for an (x, a) pair:encode(x, a) -> z- and/or
encode_batch(x, actions) -> Z(recommended for large candidate sets)
-
Create a replay buffer (for training the encoder with supervised loss)
-
Construct a NeuralLinear TS bandit:
bandit = NeuralLinearTS(encoder=E, buffer=..., config=...)
-
Online loop:
j = bandit.select_action(x, candidate_actions)- observe reward
r bandit.update(x, a_j, r)(Bayesian posterior update + store transition)- periodically:
bandit.train_encoder(...)andbandit.rebuild_posterior(...)
### Pseudocode ###
E = MyEncoder(...) # torch.nn.Module
buffer = MemoryReplayBuffer(...)
bandit = NeuralLinearTS(encoder=E, buffer=buffer, config=...)
# simulation
for t in range(T):
x = get_context()
A = get_candidate_actions() # shape (K, act_dim)
j = bandit.select_action(x, A)
r = env.step(x, A[j])
bandit.update(x, A[j], r)
if t % train_every_n == 0:
bandit.train_encoder(...)
bandit.rebuild_posterior(...)See the examples/ directory in this repository for runnable scripts (simulations).
Start with examples/README.md (example index + what each script demonstrates).
A key pattern demonstrated in examples is a realistic two-stage system:
candidate generation / retrieval → Thompson re-ranking on the candidate set.
docs/architecture.md— how the pieces fit togetherdocs/development.md— local dev workflowdocs/privacy.md— privacy posture for the librarydocs/roadmap.md— future work / milestonesdocs/references/— mathematical background notes (e.g., Bayesian linear regression used by Neural-Linear TS)
MIT License (see LICENSE).
Attribution is not required. If you use Bandexa in research or a public project and decide to give credit, you can credit it as: Keyvan Rahmani, Bandexa (2026).