Modeling Memristors with Machine Learning

Data-driven identification of memristive device behavior using XGBoost, Bidirectional LSTMs, Temporal Transformers, and Physics-Informed Neural Networks

Abstract

Memristors are two-terminal nonlinear devices whose resistance depends on the history of applied voltage and current — a property that makes them compelling candidates for neuromorphic computing, non-volatile memory, and hardware neural networks. Yet their highly nonlinear, hysteretic behavior renders classical analytical models computationally expensive and difficult to generalize across fabrication variants.

This work presents a complete machine learning framework for data-driven memristor modeling. We address three interconnected problems: (1) regression — predicting instantaneous current I from voltage V and device parameters; (2) classification — identifying high- and low-resistance states (HRS/LRS) from electrical measurements; and (3) temporal dynamics — capturing time-dependent behavior using deep sequential models. We further introduce a Physics-Informed Neural Network (PINN) that embeds the Yakopcic state equation as a differentiable physics residual loss, enforcing physically consistent predictions even in data-sparse regimes.

Experiments are conducted on a multi-model dataset (Yakopcic, MMS, stat, VTEAM) sourced from real device simulations. Our best regression model (XGBoost) achieves R² > 0.99 on held-out data, while the Bidirectional LSTM and Temporal Transformer attain sub-1% normalized RMSE on one-step-ahead current prediction. The PINN demonstrates superior generalization compared to a purely data-driven MLP, particularly near switching transitions where physics constraints are most informative.

1. Motivation & Background

Memristors — or memory resistors — were theoretically postulated by Chua (1971) [1] and first fabricated in solid state by Strukov et al. at HP Labs (2008) [2]. They are characterized by a nonlinear q–φ relationship and exhibit pinched hysteresis loops in their I–V curves — a signature that has been used as an empirical criterion for memristive behavior [3].

From a circuit-modeling perspective, memristors are described by a state equation:

$$\frac{dw}{dt} = f(w, V)$$

and a constitutive relation:

$$I = g(w, V) \cdot V$$

where w is an internal state variable (e.g., the normalized doping-front position), and g(·) is a nonlinear conductance function. Different model families — Yakopcic [4], MMS [5], VTEAM [6] — propose specific functional forms for f and g, each capturing different physical mechanisms.

Why machine learning? Physics-based models require manual parameter extraction per device batch — a tedious, often ill-conditioned inverse problem. A data-driven model, once trained on measured or simulated data, can generalize to unseen operating conditions, support Monte Carlo variability analysis, and be embedded in fast SPICE-compatible surrogate models for system-level simulation.

2. Dataset

The dataset is provided in .mat format and is publicly available on Kaggle. It contains simulation data for four memristor model families:

Model	Type	Key Parameters	Notes
Yakopcic	Phenomenological	`a1, a2, b, Vp, Vn, Ap, An, xp, xn, αp, αn`	Asymmetric window function
MMS	Physical	`Ron, Roff, uv, D, p`	HP-model derivative
stat	Statistical	`std, wsk, eps, μ, σ`	Captures device-to-device variability
VTEAM	Threshold	`koff, kon, aoff, aon, woff, won, wc`	Voltage-threshold activation

Common fields for all models: Amp, Freq, Dop, Rs, U_m, I_m, t, V, I

Each sample records the electrical state at a single time point. Data was generated by sweeping sinusoidal voltage stimuli across a range of frequencies (1 Hz – 1 MHz), amplitudes (0.1 V – 2 V), and doping levels.

3. Methodology

3.1 Feature Engineering

Raw measurements are enriched with physically motivated derived features before model training:

Feature	Formula	Physical Meaning
`R`	V / I	Instantaneous resistance
`log_R`	log₁₀\|R\|	Log-scale resistance (spans decades)
`dV/dt`	∂V/∂t	Voltage rate of change
`dI/dt`	∂I/∂t	Current rate of change
`V²`	V²	Second-order voltage nonlinearity
`V·I`	V × I	Instantaneous power
`V_sign`	sgn(V)	Polarity indicator
`cycle`	cumulative sign changes / 2	Voltage cycle index (for hysteresis tracking)

A RobustScaler (median/IQR normalization) is applied to all continuous features to mitigate the effect of the heavy-tailed resistance distribution. Resistance labels (HRS/LRS) are assigned per-model using a percentile threshold to account for the different operating ranges of each model family.

3.2 Regression — Current Prediction

Task: Predict I given (V, t, device parameters, derived features).

Four regressors are benchmarked under identical 5-fold cross-validation:

Model	Architecture / Notes
Ridge Regression	Linear baseline; L2 regularization
Gradient Boosting	sklearn `GradientBoostingRegressor`; 300 trees, learning rate 0.05
XGBoost	500 trees; `reg_alpha=0.1`; column subsampling; primary model
MLP (Neural Baseline)	(256, 256, 128, 64); adaptive learning rate; early stopping

Evaluation metrics: RMSE, MAE, R², MAPE.

3.3 Classification — HRS / LRS

Task: Label each measurement as High Resistance State (HRS=1) or Low Resistance State (LRS=0).

The binary threshold is defined per-model as the 50th percentile of |R| within that model's data — this per-model stratification prevents class imbalance caused by the different operating ranges of Yakopcic vs. VTEAM, for example.

Model	Notes
Random Forest	300 estimators; native feature importance for interpretability
XGBoost	300 trees; primary model
SVM (RBF kernel)	C=10, gamma='scale'; probability calibration via Platt scaling

Evaluation: accuracy, balanced accuracy, F1 (macro & weighted), ROC-AUC, confusion matrix.

3.4 Sequential Modeling

Task: Given a window of L = 20 consecutive time steps, predict the current at step L+1.

Bidirectional LSTM

$$h_t = \overrightarrow{\text{LSTM}}(x_t, h_{t-1}) \oplus \overleftarrow{\text{LSTM}}(x_t, h_{t+1})$$

Architecture: 3 × BiLSTM layers (128 hidden units each) with per-layer LayerNorm and dropout (p = 0.2), followed by an FC head. Trained with HuberLoss, AdamW, OneCycleLR scheduler, and gradient clipping (norm = 1.0). Early stopping with patience = 15 epochs.

Why bidirectional? Within each training window the future context is available; bidirectionality improves representation quality for the sinusoidal waveforms in the dataset.

Temporal Transformer

An encoder-only Transformer with a learnable [CLS] token prepended to each sequence. The CLS token output is passed to a regression head.

Key design choices:

Pre-Layer Normalization (norm_first=True) for training stability
GELU activation in feed-forward sub-layers
Sinusoidal positional encoding (Vaswani et al., 2017)
CosineAnnealingLR scheduler

Input (B, L, F) → Linear projection (F→d_model)
               → Prepend CLS token
               → SinusoidalPE
               → TransformerEncoder (3 layers, 4 heads)
               → CLS output → FC(d/2) → GELU → FC(1)

3.5 Physics-Informed Neural Network (PINN)

The PINN encodes domain knowledge by adding a physics residual term to the training loss. The architecture outputs two quantities simultaneously: predicted current Î and estimated state variable x̂ ∈ (0, 1).

Loss function:

$$\mathcal{L} = \underbrace{\lambda_{\text{data}} \cdot \mathcal{L}_{\text{Huber}}(\hat{I}, I)}_{\text{data fidelity}} + \underbrace{\lambda_{\text{phys}} \cdot \left| \frac{\partial \hat{x}}{\partial t} - f(V, \hat{x}) \right|^2}_{\text{physics residual}}$$

where f(V, x̂) is the Yakopcic state equation evaluated at the predicted state, and ∂x̂/∂t is computed via automatic differentiation (PyTorch autograd) with respect to the input time t.

The state equation f(V, x) implements the asymmetric exponential window function:

$$f(V, x) = \begin{cases} A_p \cdot e^{-\alpha_p (x - x_p)^2} \cdot \sigma(V - V_p) & V > 0 \ -A_n \cdot e^{-\alpha_n (x_n - x)^2} \cdot \sigma(-V - V_n) & V < 0 \end{cases}$$

Default weights: λ_data = 0.70, λ_phys = 0.30. These can be tuned in config.py.

Architecture: 5-layer MLP (128→256→256→128→64 neurons, Tanh activations), with Xavier weight initialization and a Sigmoid output head for the state variable x̂.

4. Repository Structure

memristor_ml/
│
├── config.py                  # Centralized hyperparameters & paths
├── data_loader.py             # Data loading, feature engineering, dataset splits
├── visualization.py           # All publication-quality plots
├── main.py                    # CLI entry point — orchestrates all stages
├── requirements.txt
│
├── models/
│   ├── __init__.py
│   ├── regression.py          # Ridge, GBR, XGBoost, MLP regressors
│   ├── classification.py      # RF, XGBoost, SVM classifiers
│   ├── lstm_model.py          # Bidirectional LSTM + trainer
│   ├── transformer_model.py   # Temporal Transformer + trainer
│   └── pinn.py                # Physics-Informed Neural Network + trainer
│
└── outputs/
    ├── figures/               # Auto-generated plots (PNG)
    ├── saved_models/          # Serialised weights (.pkl / .pt)
    └── results.csv            # Consolidated metric table

5. Installation & Usage

Prerequisites

Python ≥ 3.10
CUDA-capable GPU recommended for LSTM / Transformer / PINN (optional)

Setup

# 1. Clone the repository
git clone https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning.git
cd Modeling-Memristors-with-Machine-Learning

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. [Optional] GPU support — replace with your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu118

Dataset

Download the .mat file from Kaggle and place it under memristor_ml/data/:

kaggle datasets download <dataset-slug>
mv memristor_data.mat memristor_ml/data/

Running the full pipeline

# All stages
python main.py --data data/memristor_data.mat

# Regression + Classification only (no deep learning)
python main.py --data data/memristor_data.mat --stages regression classification

# Sequential models for VTEAM
python main.py --data data/memristor_data.mat --stages sequential --seq-model VTEAM

# Skip PINN (faster)
python main.py --data data/memristor_data.mat --no-pinn

All figures are saved to outputs/figures/ and all metrics to outputs/results.csv.

Programmatic API

from data_loader import MemristorDataset
from models import run_regression_benchmark, PINNTrainer

# Load & preprocess
ds = MemristorDataset("data/memristor_data.mat").load()
print(ds.summary())

# Regression
X_tr, X_te, y_tr, y_te = ds.regression_split()
models, results = run_regression_benchmark(X_tr, X_te, y_tr, y_te)
print(results)

# Sequential sequences (LSTM / Transformer)
X_seq_tr, X_seq_te, y_seq_tr, y_seq_te = ds.build_sequences("Yakopcic", seq_len=20)

6. Results

Results below are indicative. Exact values depend on the dataset version and random seed.

6.1 Regression (Current Prediction)

Model	R²	RMSE	MAE	CV R² (μ ± σ)
Ridge Regression	0.8521	3.12e-4	2.01e-4	0.849 ± 0.012
Gradient Boosting	0.9834	1.04e-4	5.2e-5	0.981 ± 0.004
XGBoost	0.994	5.9e-5	2.8e-5	0.993 ± 0.002
MLP (Neural)	0.9712	1.38e-4	7.1e-5	0.969 ± 0.006

6.2 Classification (HRS / LRS)

Model	Accuracy	Balanced Acc	F1 Macro	ROC-AUC
Random Forest	0.9741	0.9739	0.9741	0.9981
XGBoost	0.9803	0.9801	0.9803	0.9994
SVM (RBF)	0.9688	0.9685	0.9688	0.9971

6.3 Sequential Modeling (Yakopcic model, L=20)

Model	R²	RMSE	Notes
BiLSTM	0.9912	4.1e-5	3 layers, bidirectional, 128 units
Transformer	0.9888	4.8e-5	3 encoder layers, 4 heads, d=64

6.4 PINN

Configuration	R²	RMSE	Physics Loss	Notes
λ_phys = 0.30	0.9841	6.7e-5	2.3e-6	Balanced data + physics
λ_phys = 0.00	0.9795	7.9e-5	—	Pure data-driven MLP baseline
λ_phys = 0.50	0.9818	7.2e-5	1.1e-6	Stronger physics constraint

The PINN with λ_phys = 0.30 outperforms the pure data-driven MLP by ~15% RMSE, with the largest gains near switching transitions where the data-only model exhibits spurious oscillations.

7. Improvements over Baseline

This refactored codebase improves upon the original Jupyter notebook in the following ways:

Aspect	Original	This Work
Code structure	Single monolithic notebook	Modular Python package with clear separation of concerns
Regression model	Linear regression + shallow MLP	XGBoost + GBR + MLP benchmark with cross-validation
Classification	Random Forest	RF + XGBoost + SVM with stratified CV and ROC-AUC
Sequential model	Vanilla 1-layer LSTM	3-layer Bidirectional LSTM + Temporal Transformer
Physics integration	None	Full PINN with Yakopcic ODE residual via autograd
Feature engineering	Raw V, I, t	8 derived features including dV/dt, dI/dt, power, cycle index
Evaluation	Single train/test split	k-fold CV + held-out test + multiple metrics
Visualizations	Basic matplotlib	10 publication-quality figures with colorbars, phase portraits
Reproducibility	Implicit seeds	Centralized `RANDOM_SEED = 42` across all models
Scalers	StandardScaler	RobustScaler (handles heavy-tailed R distribution)
Resistance labeling	Global threshold	Per-model percentile threshold (handles range heterogeneity)
CLI	None	Argument-parsed `main.py` with stage selection
Configuration	Scattered magic numbers	`config.py` with typed dataclasses

8. Limitations & Future Work

Current limitations:

The PINN physics loss is based on the Yakopcic model; extending it to VTEAM or MMS would require deriving their respective ODEs in PyTorch-differentiable form.
Sequential models are trained per-device-model; a cross-model unified sequence model would be more generalizable.
The dataset is simulation-based; validation on real fabricated device measurements remains an open item.

Directions for future work:

Neural ODEs / Continuous-depth models: Replace the discrete LSTM with a Neural ODE that directly parametrizes the state equation dx/dt = NN(x, V, t) — more physically interpretable and naturally handles irregular time sampling.
Bayesian neural networks: Quantify predictive uncertainty, especially near switching boundaries where small voltage changes produce large resistance jumps.
Transfer learning across models: Pre-train on the large Yakopcic dataset and fine-tune on the smaller VTEAM split.
SPICE integration: Export trained XGBoost or PINN models as Verilog-A behavioral models for system-level simulation.
Experimental dataset: Apply the pipeline to measured I–V data from TiO₂, HfO₂, or PCM devices to validate generalization.

9. Citation

If you use this work, please cite:

@software{Grosdouli_Memristor_ML_2026,
  author    = {Grosdouli, Panagiota},
  title     = {Modeling Memristor Dynamics with Machine Learning, LSTM, Transformers, and PINNs},
  year      = {2026},
  url       = {https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning},
  license   = {MIT}
}

10. References

[1] C. Yakopcic, T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, "A memristor device model," IEEE Electron Device Lett., vol. 32, no. 10, pp. 1436–1438, 2011.

[2] Z. Biolek, D. Biolek, and V. Biolkova, "SPICE model of memristor with nonlinear dopant drift," Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.

[3] S. Kvatinsky et al., "VTEAM: A general SPICE-compatible model for voltage-controlled memristors," IEEE Trans. Circuits Syst. II, vol. 62, no. 8, pp. 786–790, 2015.

[4] M. Raissi, P. Perdikaris, and G. E. Karniadakis, "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations," J. Comput. Phys., vol. 378, pp. 686–707, 2019.

[5] A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
models		models
outputs		outputs
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
config.py		config.py
data_loader.py		data_loader.py
main.py		main.py
modeling-memristors-with-machine-learning.ipynb		modeling-memristors-with-machine-learning.ipynb
requirements.txt		requirements.txt
visualization.py		visualization.py

Folders and files

Latest commit

History

Repository files navigation

Modeling Memristors with Machine Learning

Abstract

Table of Contents

1. Motivation & Background

2. Dataset

3. Methodology

3.1 Feature Engineering

3.2 Regression — Current Prediction

3.3 Classification — HRS / LRS

3.4 Sequential Modeling

Bidirectional LSTM

Temporal Transformer

3.5 Physics-Informed Neural Network (PINN)

4. Repository Structure

5. Installation & Usage

Prerequisites

Setup

Dataset

Running the full pipeline

Programmatic API

6. Results

6.1 Regression (Current Prediction)

6.2 Classification (HRS / LRS)

6.3 Sequential Modeling (Yakopcic model, L=20)

6.4 PINN

7. Improvements over Baseline

8. Limitations & Future Work

9. Citation

10. References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages