Data-driven identification of memristive device behavior using XGBoost, Bidirectional LSTMs, Temporal Transformers, and Physics-Informed Neural Networks
Memristors are two-terminal nonlinear devices whose resistance depends on the history of applied voltage and current — a property that makes them compelling candidates for neuromorphic computing, non-volatile memory, and hardware neural networks. Yet their highly nonlinear, hysteretic behavior renders classical analytical models computationally expensive and difficult to generalize across fabrication variants.
This work presents a complete machine learning framework for data-driven memristor modeling. We address three interconnected problems: (1) regression — predicting instantaneous current I from voltage V and device parameters; (2) classification — identifying high- and low-resistance states (HRS/LRS) from electrical measurements; and (3) temporal dynamics — capturing time-dependent behavior using deep sequential models. We further introduce a Physics-Informed Neural Network (PINN) that embeds the Yakopcic state equation as a differentiable physics residual loss, enforcing physically consistent predictions even in data-sparse regimes.
Experiments are conducted on a multi-model dataset (Yakopcic, MMS, stat, VTEAM) sourced from real device simulations. Our best regression model (XGBoost) achieves R² > 0.99 on held-out data, while the Bidirectional LSTM and Temporal Transformer attain sub-1% normalized RMSE on one-step-ahead current prediction. The PINN demonstrates superior generalization compared to a purely data-driven MLP, particularly near switching transitions where physics constraints are most informative.
- Motivation & Background
- Dataset
- Methodology
- Repository Structure
- Installation & Usage
- Results
- Improvements over Baseline
- Limitations & Future Work
- Citation
- References
Memristors — or memory resistors — were theoretically postulated by Chua (1971) [1] and first fabricated in solid state by Strukov et al. at HP Labs (2008) [2]. They are characterized by a nonlinear q–φ relationship and exhibit pinched hysteresis loops in their I–V curves — a signature that has been used as an empirical criterion for memristive behavior [3].
From a circuit-modeling perspective, memristors are described by a state equation:
and a constitutive relation:
where w is an internal state variable (e.g., the normalized doping-front position), and g(·) is a nonlinear conductance function. Different model families — Yakopcic [4], MMS [5], VTEAM [6] — propose specific functional forms for f and g, each capturing different physical mechanisms.
Why machine learning? Physics-based models require manual parameter extraction per device batch — a tedious, often ill-conditioned inverse problem. A data-driven model, once trained on measured or simulated data, can generalize to unseen operating conditions, support Monte Carlo variability analysis, and be embedded in fast SPICE-compatible surrogate models for system-level simulation.
The dataset is provided in .mat format and is publicly available on Kaggle. It contains simulation data for four memristor model families:
| Model | Type | Key Parameters | Notes |
|---|---|---|---|
| Yakopcic | Phenomenological | a1, a2, b, Vp, Vn, Ap, An, xp, xn, αp, αn |
Asymmetric window function |
| MMS | Physical | Ron, Roff, uv, D, p |
HP-model derivative |
| stat | Statistical | std, wsk, eps, μ, σ |
Captures device-to-device variability |
| VTEAM | Threshold | koff, kon, aoff, aon, woff, won, wc |
Voltage-threshold activation |
Common fields for all models: Amp, Freq, Dop, Rs, U_m, I_m, t, V, I
Each sample records the electrical state at a single time point. Data was generated by sweeping sinusoidal voltage stimuli across a range of frequencies (1 Hz – 1 MHz), amplitudes (0.1 V – 2 V), and doping levels.
Raw measurements are enriched with physically motivated derived features before model training:
| Feature | Formula | Physical Meaning |
|---|---|---|
R |
V / I | Instantaneous resistance |
log_R |
log₁₀|R| | Log-scale resistance (spans decades) |
dV/dt |
∂V/∂t | Voltage rate of change |
dI/dt |
∂I/∂t | Current rate of change |
V² |
V² | Second-order voltage nonlinearity |
V·I |
V × I | Instantaneous power |
V_sign |
sgn(V) | Polarity indicator |
cycle |
cumulative sign changes / 2 | Voltage cycle index (for hysteresis tracking) |
A RobustScaler (median/IQR normalization) is applied to all continuous features to mitigate the effect of the heavy-tailed resistance distribution. Resistance labels (HRS/LRS) are assigned per-model using a percentile threshold to account for the different operating ranges of each model family.
Task: Predict I given (V, t, device parameters, derived features).
Four regressors are benchmarked under identical 5-fold cross-validation:
| Model | Architecture / Notes |
|---|---|
| Ridge Regression | Linear baseline; L2 regularization |
| Gradient Boosting | sklearn GradientBoostingRegressor; 300 trees, learning rate 0.05 |
| XGBoost | 500 trees; reg_alpha=0.1; column subsampling; primary model |
| MLP (Neural Baseline) | (256, 256, 128, 64); adaptive learning rate; early stopping |
Evaluation metrics: RMSE, MAE, R², MAPE.
Task: Label each measurement as High Resistance State (HRS=1) or Low Resistance State (LRS=0).
The binary threshold is defined per-model as the 50th percentile of |R| within that model's data — this per-model stratification prevents class imbalance caused by the different operating ranges of Yakopcic vs. VTEAM, for example.
| Model | Notes |
|---|---|
| Random Forest | 300 estimators; native feature importance for interpretability |
| XGBoost | 300 trees; primary model |
| SVM (RBF kernel) | C=10, gamma='scale'; probability calibration via Platt scaling |
Evaluation: accuracy, balanced accuracy, F1 (macro & weighted), ROC-AUC, confusion matrix.
Task: Given a window of L = 20 consecutive time steps, predict the current at step L+1.
Architecture: 3 × BiLSTM layers (128 hidden units each) with per-layer LayerNorm and dropout (p = 0.2), followed by an FC head. Trained with HuberLoss, AdamW, OneCycleLR scheduler, and gradient clipping (norm = 1.0). Early stopping with patience = 15 epochs.
Why bidirectional? Within each training window the future context is available; bidirectionality improves representation quality for the sinusoidal waveforms in the dataset.
An encoder-only Transformer with a learnable [CLS] token prepended to each sequence. The CLS token output is passed to a regression head.
Key design choices:
- Pre-Layer Normalization (norm_first=True) for training stability
- GELU activation in feed-forward sub-layers
- Sinusoidal positional encoding (Vaswani et al., 2017)
- CosineAnnealingLR scheduler
Input (B, L, F) → Linear projection (F→d_model)
→ Prepend CLS token
→ SinusoidalPE
→ TransformerEncoder (3 layers, 4 heads)
→ CLS output → FC(d/2) → GELU → FC(1)
The PINN encodes domain knowledge by adding a physics residual term to the training loss. The architecture outputs two quantities simultaneously: predicted current Î and estimated state variable x̂ ∈ (0, 1).
Loss function:
where f(V, x̂) is the Yakopcic state equation evaluated at the predicted state, and ∂x̂/∂t is computed via automatic differentiation (PyTorch autograd) with respect to the input time t.
The state equation f(V, x) implements the asymmetric exponential window function:
Default weights: λ_data = 0.70, λ_phys = 0.30. These can be tuned in config.py.
Architecture: 5-layer MLP (128→256→256→128→64 neurons, Tanh activations), with Xavier weight initialization and a Sigmoid output head for the state variable x̂.
memristor_ml/
│
├── config.py # Centralized hyperparameters & paths
├── data_loader.py # Data loading, feature engineering, dataset splits
├── visualization.py # All publication-quality plots
├── main.py # CLI entry point — orchestrates all stages
├── requirements.txt
│
├── models/
│ ├── __init__.py
│ ├── regression.py # Ridge, GBR, XGBoost, MLP regressors
│ ├── classification.py # RF, XGBoost, SVM classifiers
│ ├── lstm_model.py # Bidirectional LSTM + trainer
│ ├── transformer_model.py # Temporal Transformer + trainer
│ └── pinn.py # Physics-Informed Neural Network + trainer
│
└── outputs/
├── figures/ # Auto-generated plots (PNG)
├── saved_models/ # Serialised weights (.pkl / .pt)
└── results.csv # Consolidated metric table
- Python ≥ 3.10
- CUDA-capable GPU recommended for LSTM / Transformer / PINN (optional)
# 1. Clone the repository
git clone https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning.git
cd Modeling-Memristors-with-Machine-Learning
# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. [Optional] GPU support — replace with your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu118Download the .mat file from Kaggle and place it under memristor_ml/data/:
kaggle datasets download <dataset-slug>
mv memristor_data.mat memristor_ml/data/# All stages
python main.py --data data/memristor_data.mat
# Regression + Classification only (no deep learning)
python main.py --data data/memristor_data.mat --stages regression classification
# Sequential models for VTEAM
python main.py --data data/memristor_data.mat --stages sequential --seq-model VTEAM
# Skip PINN (faster)
python main.py --data data/memristor_data.mat --no-pinnAll figures are saved to outputs/figures/ and all metrics to outputs/results.csv.
from data_loader import MemristorDataset
from models import run_regression_benchmark, PINNTrainer
# Load & preprocess
ds = MemristorDataset("data/memristor_data.mat").load()
print(ds.summary())
# Regression
X_tr, X_te, y_tr, y_te = ds.regression_split()
models, results = run_regression_benchmark(X_tr, X_te, y_tr, y_te)
print(results)
# Sequential sequences (LSTM / Transformer)
X_seq_tr, X_seq_te, y_seq_tr, y_seq_te = ds.build_sequences("Yakopcic", seq_len=20)Results below are indicative. Exact values depend on the dataset version and random seed.
| Model | R² | RMSE | MAE | CV R² (μ ± σ) |
|---|---|---|---|---|
| Ridge Regression | 0.8521 | 3.12e-4 | 2.01e-4 | 0.849 ± 0.012 |
| Gradient Boosting | 0.9834 | 1.04e-4 | 5.2e-5 | 0.981 ± 0.004 |
| XGBoost | 0.994 | 5.9e-5 | 2.8e-5 | 0.993 ± 0.002 |
| MLP (Neural) | 0.9712 | 1.38e-4 | 7.1e-5 | 0.969 ± 0.006 |
| Model | Accuracy | Balanced Acc | F1 Macro | ROC-AUC |
|---|---|---|---|---|
| Random Forest | 0.9741 | 0.9739 | 0.9741 | 0.9981 |
| XGBoost | 0.9803 | 0.9801 | 0.9803 | 0.9994 |
| SVM (RBF) | 0.9688 | 0.9685 | 0.9688 | 0.9971 |
| Model | R² | RMSE | Notes |
|---|---|---|---|
| BiLSTM | 0.9912 | 4.1e-5 | 3 layers, bidirectional, 128 units |
| Transformer | 0.9888 | 4.8e-5 | 3 encoder layers, 4 heads, d=64 |
| Configuration | R² | RMSE | Physics Loss | Notes |
|---|---|---|---|---|
| λ_phys = 0.30 | 0.9841 | 6.7e-5 | 2.3e-6 | Balanced data + physics |
| λ_phys = 0.00 | 0.9795 | 7.9e-5 | — | Pure data-driven MLP baseline |
| λ_phys = 0.50 | 0.9818 | 7.2e-5 | 1.1e-6 | Stronger physics constraint |
The PINN with λ_phys = 0.30 outperforms the pure data-driven MLP by ~15% RMSE, with the largest gains near switching transitions where the data-only model exhibits spurious oscillations.
This refactored codebase improves upon the original Jupyter notebook in the following ways:
| Aspect | Original | This Work |
|---|---|---|
| Code structure | Single monolithic notebook | Modular Python package with clear separation of concerns |
| Regression model | Linear regression + shallow MLP | XGBoost + GBR + MLP benchmark with cross-validation |
| Classification | Random Forest | RF + XGBoost + SVM with stratified CV and ROC-AUC |
| Sequential model | Vanilla 1-layer LSTM | 3-layer Bidirectional LSTM + Temporal Transformer |
| Physics integration | None | Full PINN with Yakopcic ODE residual via autograd |
| Feature engineering | Raw V, I, t | 8 derived features including dV/dt, dI/dt, power, cycle index |
| Evaluation | Single train/test split | k-fold CV + held-out test + multiple metrics |
| Visualizations | Basic matplotlib | 10 publication-quality figures with colorbars, phase portraits |
| Reproducibility | Implicit seeds | Centralized RANDOM_SEED = 42 across all models |
| Scalers | StandardScaler | RobustScaler (handles heavy-tailed R distribution) |
| Resistance labeling | Global threshold | Per-model percentile threshold (handles range heterogeneity) |
| CLI | None | Argument-parsed main.py with stage selection |
| Configuration | Scattered magic numbers | config.py with typed dataclasses |
Current limitations:
- The PINN physics loss is based on the Yakopcic model; extending it to VTEAM or MMS would require deriving their respective ODEs in PyTorch-differentiable form.
- Sequential models are trained per-device-model; a cross-model unified sequence model would be more generalizable.
- The dataset is simulation-based; validation on real fabricated device measurements remains an open item.
Directions for future work:
- Neural ODEs / Continuous-depth models: Replace the discrete LSTM with a Neural ODE that directly parametrizes the state equation dx/dt = NN(x, V, t) — more physically interpretable and naturally handles irregular time sampling.
- Bayesian neural networks: Quantify predictive uncertainty, especially near switching boundaries where small voltage changes produce large resistance jumps.
- Transfer learning across models: Pre-train on the large Yakopcic dataset and fine-tune on the smaller VTEAM split.
- SPICE integration: Export trained XGBoost or PINN models as Verilog-A behavioral models for system-level simulation.
- Experimental dataset: Apply the pipeline to measured I–V data from TiO₂, HfO₂, or PCM devices to validate generalization.
If you use this work, please cite:
@software{Grosdouli_Memristor_ML_2026,
author = {Grosdouli, Panagiota},
title = {Modeling Memristor Dynamics with Machine Learning, LSTM, Transformers, and PINNs},
year = {2026},
url = {https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning},
license = {MIT}
}[1] C. Yakopcic, T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, "A memristor device model," IEEE Electron Device Lett., vol. 32, no. 10, pp. 1436–1438, 2011.
[2] Z. Biolek, D. Biolek, and V. Biolkova, "SPICE model of memristor with nonlinear dopant drift," Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.
[3] S. Kvatinsky et al., "VTEAM: A general SPICE-compatible model for voltage-controlled memristors," IEEE Trans. Circuits Syst. II, vol. 62, no. 8, pp. 786–790, 2015.
[4] M. Raissi, P. Perdikaris, and G. E. Karniadakis, "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations," J. Comput. Phys., vol. 378, pp. 686–707, 2019.
[5] A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.
