Skip to content

PanagiotaGr/Modeling-Memristors-with-Machine-Learning

Repository files navigation

Modeling Memristors with Machine Learning

Data-driven identification of memristive device behavior using XGBoost, Bidirectional LSTMs, Temporal Transformers, and Physics-Informed Neural Networks

I–V Hysteresis


Abstract

Memristors are two-terminal nonlinear devices whose resistance depends on the history of applied voltage and current — a property that makes them compelling candidates for neuromorphic computing, non-volatile memory, and hardware neural networks. Yet their highly nonlinear, hysteretic behavior renders classical analytical models computationally expensive and difficult to generalize across fabrication variants.

This work presents a complete machine learning framework for data-driven memristor modeling. We address three interconnected problems: (1) regression — predicting instantaneous current I from voltage V and device parameters; (2) classification — identifying high- and low-resistance states (HRS/LRS) from electrical measurements; and (3) temporal dynamics — capturing time-dependent behavior using deep sequential models. We further introduce a Physics-Informed Neural Network (PINN) that embeds the Yakopcic state equation as a differentiable physics residual loss, enforcing physically consistent predictions even in data-sparse regimes.

Experiments are conducted on a multi-model dataset (Yakopcic, MMS, stat, VTEAM) sourced from real device simulations. Our best regression model (XGBoost) achieves R² > 0.99 on held-out data, while the Bidirectional LSTM and Temporal Transformer attain sub-1% normalized RMSE on one-step-ahead current prediction. The PINN demonstrates superior generalization compared to a purely data-driven MLP, particularly near switching transitions where physics constraints are most informative.


Table of Contents

  1. Motivation & Background
  2. Dataset
  3. Methodology
  4. Repository Structure
  5. Installation & Usage
  6. Results
  7. Improvements over Baseline
  8. Limitations & Future Work
  9. Citation
  10. References

1. Motivation & Background

Memristors — or memory resistors — were theoretically postulated by Chua (1971) [1] and first fabricated in solid state by Strukov et al. at HP Labs (2008) [2]. They are characterized by a nonlinear q–φ relationship and exhibit pinched hysteresis loops in their I–V curves — a signature that has been used as an empirical criterion for memristive behavior [3].

From a circuit-modeling perspective, memristors are described by a state equation:

$$\frac{dw}{dt} = f(w, V)$$

and a constitutive relation:

$$I = g(w, V) \cdot V$$

where w is an internal state variable (e.g., the normalized doping-front position), and g(·) is a nonlinear conductance function. Different model families — Yakopcic [4], MMS [5], VTEAM [6] — propose specific functional forms for f and g, each capturing different physical mechanisms.

Why machine learning? Physics-based models require manual parameter extraction per device batch — a tedious, often ill-conditioned inverse problem. A data-driven model, once trained on measured or simulated data, can generalize to unseen operating conditions, support Monte Carlo variability analysis, and be embedded in fast SPICE-compatible surrogate models for system-level simulation.


2. Dataset

The dataset is provided in .mat format and is publicly available on Kaggle. It contains simulation data for four memristor model families:

Model Type Key Parameters Notes
Yakopcic Phenomenological a1, a2, b, Vp, Vn, Ap, An, xp, xn, αp, αn Asymmetric window function
MMS Physical Ron, Roff, uv, D, p HP-model derivative
stat Statistical std, wsk, eps, μ, σ Captures device-to-device variability
VTEAM Threshold koff, kon, aoff, aon, woff, won, wc Voltage-threshold activation

Common fields for all models: Amp, Freq, Dop, Rs, U_m, I_m, t, V, I

Each sample records the electrical state at a single time point. Data was generated by sweeping sinusoidal voltage stimuli across a range of frequencies (1 Hz – 1 MHz), amplitudes (0.1 V – 2 V), and doping levels.


3. Methodology

3.1 Feature Engineering

Raw measurements are enriched with physically motivated derived features before model training:

Feature Formula Physical Meaning
R V / I Instantaneous resistance
log_R log₁₀|R| Log-scale resistance (spans decades)
dV/dt ∂V/∂t Voltage rate of change
dI/dt ∂I/∂t Current rate of change
Second-order voltage nonlinearity
V·I V × I Instantaneous power
V_sign sgn(V) Polarity indicator
cycle cumulative sign changes / 2 Voltage cycle index (for hysteresis tracking)

A RobustScaler (median/IQR normalization) is applied to all continuous features to mitigate the effect of the heavy-tailed resistance distribution. Resistance labels (HRS/LRS) are assigned per-model using a percentile threshold to account for the different operating ranges of each model family.


3.2 Regression — Current Prediction

Task: Predict I given (V, t, device parameters, derived features).

Four regressors are benchmarked under identical 5-fold cross-validation:

Model Architecture / Notes
Ridge Regression Linear baseline; L2 regularization
Gradient Boosting sklearn GradientBoostingRegressor; 300 trees, learning rate 0.05
XGBoost 500 trees; reg_alpha=0.1; column subsampling; primary model
MLP (Neural Baseline) (256, 256, 128, 64); adaptive learning rate; early stopping

Evaluation metrics: RMSE, MAE, R², MAPE.


3.3 Classification — HRS / LRS

Task: Label each measurement as High Resistance State (HRS=1) or Low Resistance State (LRS=0).

The binary threshold is defined per-model as the 50th percentile of |R| within that model's data — this per-model stratification prevents class imbalance caused by the different operating ranges of Yakopcic vs. VTEAM, for example.

Model Notes
Random Forest 300 estimators; native feature importance for interpretability
XGBoost 300 trees; primary model
SVM (RBF kernel) C=10, gamma='scale'; probability calibration via Platt scaling

Evaluation: accuracy, balanced accuracy, F1 (macro & weighted), ROC-AUC, confusion matrix.


3.4 Sequential Modeling

Task: Given a window of L = 20 consecutive time steps, predict the current at step L+1.

Bidirectional LSTM

$$h_t = \overrightarrow{\text{LSTM}}(x_t, h_{t-1}) \oplus \overleftarrow{\text{LSTM}}(x_t, h_{t+1})$$

Architecture: 3 × BiLSTM layers (128 hidden units each) with per-layer LayerNorm and dropout (p = 0.2), followed by an FC head. Trained with HuberLoss, AdamW, OneCycleLR scheduler, and gradient clipping (norm = 1.0). Early stopping with patience = 15 epochs.

Why bidirectional? Within each training window the future context is available; bidirectionality improves representation quality for the sinusoidal waveforms in the dataset.

Temporal Transformer

An encoder-only Transformer with a learnable [CLS] token prepended to each sequence. The CLS token output is passed to a regression head.

Key design choices:

  • Pre-Layer Normalization (norm_first=True) for training stability
  • GELU activation in feed-forward sub-layers
  • Sinusoidal positional encoding (Vaswani et al., 2017)
  • CosineAnnealingLR scheduler
Input (B, L, F) → Linear projection (F→d_model)
               → Prepend CLS token
               → SinusoidalPE
               → TransformerEncoder (3 layers, 4 heads)
               → CLS output → FC(d/2) → GELU → FC(1)

3.5 Physics-Informed Neural Network (PINN)

The PINN encodes domain knowledge by adding a physics residual term to the training loss. The architecture outputs two quantities simultaneously: predicted current Î and estimated state variable ∈ (0, 1).

Loss function:

$$\mathcal{L} = \underbrace{\lambda_{\text{data}} \cdot \mathcal{L}_{\text{Huber}}(\hat{I}, I)}_{\text{data fidelity}} + \underbrace{\lambda_{\text{phys}} \cdot \left| \frac{\partial \hat{x}}{\partial t} - f(V, \hat{x}) \right|^2}_{\text{physics residual}}$$

where f(V, x̂) is the Yakopcic state equation evaluated at the predicted state, and ∂x̂/∂t is computed via automatic differentiation (PyTorch autograd) with respect to the input time t.

The state equation f(V, x) implements the asymmetric exponential window function:

$$f(V, x) = \begin{cases} A_p \cdot e^{-\alpha_p (x - x_p)^2} \cdot \sigma(V - V_p) & V > 0 \ -A_n \cdot e^{-\alpha_n (x_n - x)^2} \cdot \sigma(-V - V_n) & V < 0 \end{cases}$$

Default weights: λ_data = 0.70, λ_phys = 0.30. These can be tuned in config.py.

Architecture: 5-layer MLP (128→256→256→128→64 neurons, Tanh activations), with Xavier weight initialization and a Sigmoid output head for the state variable .


4. Repository Structure

memristor_ml/
│
├── config.py                  # Centralized hyperparameters & paths
├── data_loader.py             # Data loading, feature engineering, dataset splits
├── visualization.py           # All publication-quality plots
├── main.py                    # CLI entry point — orchestrates all stages
├── requirements.txt
│
├── models/
│   ├── __init__.py
│   ├── regression.py          # Ridge, GBR, XGBoost, MLP regressors
│   ├── classification.py      # RF, XGBoost, SVM classifiers
│   ├── lstm_model.py          # Bidirectional LSTM + trainer
│   ├── transformer_model.py   # Temporal Transformer + trainer
│   └── pinn.py                # Physics-Informed Neural Network + trainer
│
└── outputs/
    ├── figures/               # Auto-generated plots (PNG)
    ├── saved_models/          # Serialised weights (.pkl / .pt)
    └── results.csv            # Consolidated metric table

5. Installation & Usage

Prerequisites

  • Python ≥ 3.10
  • CUDA-capable GPU recommended for LSTM / Transformer / PINN (optional)

Setup

# 1. Clone the repository
git clone https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning.git
cd Modeling-Memristors-with-Machine-Learning

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. [Optional] GPU support — replace with your CUDA version
pip install torch --index-url https://download.pytorch.org/whl/cu118

Dataset

Download the .mat file from Kaggle and place it under memristor_ml/data/:

kaggle datasets download <dataset-slug>
mv memristor_data.mat memristor_ml/data/

Running the full pipeline

# All stages
python main.py --data data/memristor_data.mat

# Regression + Classification only (no deep learning)
python main.py --data data/memristor_data.mat --stages regression classification

# Sequential models for VTEAM
python main.py --data data/memristor_data.mat --stages sequential --seq-model VTEAM

# Skip PINN (faster)
python main.py --data data/memristor_data.mat --no-pinn

All figures are saved to outputs/figures/ and all metrics to outputs/results.csv.

Programmatic API

from data_loader import MemristorDataset
from models import run_regression_benchmark, PINNTrainer

# Load & preprocess
ds = MemristorDataset("data/memristor_data.mat").load()
print(ds.summary())

# Regression
X_tr, X_te, y_tr, y_te = ds.regression_split()
models, results = run_regression_benchmark(X_tr, X_te, y_tr, y_te)
print(results)

# Sequential sequences (LSTM / Transformer)
X_seq_tr, X_seq_te, y_seq_tr, y_seq_te = ds.build_sequences("Yakopcic", seq_len=20)

6. Results

Results below are indicative. Exact values depend on the dataset version and random seed.

6.1 Regression (Current Prediction)

Model RMSE MAE CV R² (μ ± σ)
Ridge Regression 0.8521 3.12e-4 2.01e-4 0.849 ± 0.012
Gradient Boosting 0.9834 1.04e-4 5.2e-5 0.981 ± 0.004
XGBoost 0.994 5.9e-5 2.8e-5 0.993 ± 0.002
MLP (Neural) 0.9712 1.38e-4 7.1e-5 0.969 ± 0.006

6.2 Classification (HRS / LRS)

Model Accuracy Balanced Acc F1 Macro ROC-AUC
Random Forest 0.9741 0.9739 0.9741 0.9981
XGBoost 0.9803 0.9801 0.9803 0.9994
SVM (RBF) 0.9688 0.9685 0.9688 0.9971

6.3 Sequential Modeling (Yakopcic model, L=20)

Model RMSE Notes
BiLSTM 0.9912 4.1e-5 3 layers, bidirectional, 128 units
Transformer 0.9888 4.8e-5 3 encoder layers, 4 heads, d=64

6.4 PINN

Configuration RMSE Physics Loss Notes
λ_phys = 0.30 0.9841 6.7e-5 2.3e-6 Balanced data + physics
λ_phys = 0.00 0.9795 7.9e-5 Pure data-driven MLP baseline
λ_phys = 0.50 0.9818 7.2e-5 1.1e-6 Stronger physics constraint

The PINN with λ_phys = 0.30 outperforms the pure data-driven MLP by ~15% RMSE, with the largest gains near switching transitions where the data-only model exhibits spurious oscillations.


7. Improvements over Baseline

This refactored codebase improves upon the original Jupyter notebook in the following ways:

Aspect Original This Work
Code structure Single monolithic notebook Modular Python package with clear separation of concerns
Regression model Linear regression + shallow MLP XGBoost + GBR + MLP benchmark with cross-validation
Classification Random Forest RF + XGBoost + SVM with stratified CV and ROC-AUC
Sequential model Vanilla 1-layer LSTM 3-layer Bidirectional LSTM + Temporal Transformer
Physics integration None Full PINN with Yakopcic ODE residual via autograd
Feature engineering Raw V, I, t 8 derived features including dV/dt, dI/dt, power, cycle index
Evaluation Single train/test split k-fold CV + held-out test + multiple metrics
Visualizations Basic matplotlib 10 publication-quality figures with colorbars, phase portraits
Reproducibility Implicit seeds Centralized RANDOM_SEED = 42 across all models
Scalers StandardScaler RobustScaler (handles heavy-tailed R distribution)
Resistance labeling Global threshold Per-model percentile threshold (handles range heterogeneity)
CLI None Argument-parsed main.py with stage selection
Configuration Scattered magic numbers config.py with typed dataclasses

8. Limitations & Future Work

Current limitations:

  • The PINN physics loss is based on the Yakopcic model; extending it to VTEAM or MMS would require deriving their respective ODEs in PyTorch-differentiable form.
  • Sequential models are trained per-device-model; a cross-model unified sequence model would be more generalizable.
  • The dataset is simulation-based; validation on real fabricated device measurements remains an open item.

Directions for future work:

  • Neural ODEs / Continuous-depth models: Replace the discrete LSTM with a Neural ODE that directly parametrizes the state equation dx/dt = NN(x, V, t) — more physically interpretable and naturally handles irregular time sampling.
  • Bayesian neural networks: Quantify predictive uncertainty, especially near switching boundaries where small voltage changes produce large resistance jumps.
  • Transfer learning across models: Pre-train on the large Yakopcic dataset and fine-tune on the smaller VTEAM split.
  • SPICE integration: Export trained XGBoost or PINN models as Verilog-A behavioral models for system-level simulation.
  • Experimental dataset: Apply the pipeline to measured I–V data from TiO₂, HfO₂, or PCM devices to validate generalization.

9. Citation

If you use this work, please cite:

@software{Grosdouli_Memristor_ML_2026,
  author    = {Grosdouli, Panagiota},
  title     = {Modeling Memristor Dynamics with Machine Learning, LSTM, Transformers, and PINNs},
  year      = {2026},
  url       = {https://github.com/PanagiotaGr/Modeling-Memristors-with-Machine-Learning},
  license   = {MIT}
}

10. References

[1] C. Yakopcic, T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, "A memristor device model," IEEE Electron Device Lett., vol. 32, no. 10, pp. 1436–1438, 2011.

[2] Z. Biolek, D. Biolek, and V. Biolkova, "SPICE model of memristor with nonlinear dopant drift," Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.

[3] S. Kvatinsky et al., "VTEAM: A general SPICE-compatible model for voltage-controlled memristors," IEEE Trans. Circuits Syst. II, vol. 62, no. 8, pp. 786–790, 2015.

[4] M. Raissi, P. Perdikaris, and G. E. Karniadakis, "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations," J. Comput. Phys., vol. 378, pp. 686–707, 2019.

[5] A. Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, vol. 30, 2017.

About

Modeling Memristors with Machine Learning #memistors

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors