RAG + LLMs for Perovskite Stability (T80) Discovery

Generate novel ABX₃ perovskite candidates with longer shelf-life, pre-rank them with a T80 surrogate model, then rerank the Top-25 with RAG-assisted LLMs (Qwen, Llama, Flan) using evidence from the literature.

Why this? Direct LLM finetuning on raw tabular data led to hallucinations. Using LLMs for what they do best—reading and synthesizing papers—yields consistent, defensible rerankings when paired with a transparent surrogate.

Repository

├─ train_surrogate_predictor.py   # train T80 surrogate & feature space
├─ generate_and_score.py          # sample/validate ABX3, predict T80, rank & export
├─ Qwen_Notebook.ipynb            # RAG + JSON scoring for Qwen
├─ lLama_Notebook.ipynb           # RAG + JSON scoring for Llama
├─ Flan_Notebook.ipynb            # RAG + JSON scoring for Flan
└─ Untitled Diagram.jpg           # schematic

Setup

# Python 3.10+ recommended
pip install -U numpy pandas scikit-learn xgboost joblib tqdm
# Notebooks may also need: transformers sentence-transformers chromadb

GPU is optional; it helps for embeddings/LLM steps in notebooks.

Data

rows_with_T80.csv — cleaned historical dataset with T80
A folder of papers (PDF/HTML) for retrieval (used by notebooks)

Quickstart

1) Train the surrogate

python train_surrogate_predictor.py \
  --data rows_with_T80.csv \
  --out t80_surrogate_xgb.joblib \
  --feature-space feature_space.json

Outputs

t80_surrogate_xgb.joblib — trained model pipeline
feature_space.json — allowed ions/tokens + radii map (for chemistry checks)

2) Generate & score candidates (ABX₃)

python generate_and_score.py \
  --goal 1000 \
  --allow-pb 1 \
  --n 8000 \
  --workers 6 \
  --model t80_surrogate_xgb.joblib \
  --feature-space feature_space.json \
  --existing rows_with_T80.csv

What happens

Samples A/B/X within feature space, enforces charge neutrality & tolerance factors (t, μ)
Predicts log10(T80) via surrogate; computes novelty vs historical tokens
Ranking: meets_goal → pred_T80_h → novelty

Output

top_candidates.csv (up to top 500)

3) RAG + LLM reranking (Top-25)

Open and run these notebooks (set top_candidates.csv path at the top):

Qwen_Notebook.ipynb
lLama_Notebook.ipynb
Flan_Notebook.ipynb

Each notebook:

Builds/uses a small RAG index of papers (Nature, Joule, NREL, …).
For Top-25 candidates, retrieves evidence and prompts the model to return JSON:
viability_score, consistency_score, risks, notes, cites.
Saves per-model CSVs and an optional HTML leaderboard (side-by-side model Top-10 + sortable Top-25).

Typical outputs

reranked_qwen.csv, reranked_llama.csv, reranked_flan.csv
leaderboard_demo_side_by_side.html

(Optional) Combine per-model results into one Top-25 leaderboard

import json, pandas as pd

def _keyify(df):
    def canon(x):
        return json.dumps(json.loads(x), sort_keys=True) if isinstance(x, str) else json.dumps(x, sort_keys=True)
    def k(r): return "|".join([canon(r["A"]), canon(r["B"]), canon(r["X"]), str(r.get("additives",""))])
    if "cand_key" not in df.columns:
        df = df.copy(); df["cand_key"] = df.apply(k, axis=1)
    return df

base  = _keyify(pd.read_csv("top_candidates.csv")).sort_values("pred_T80_h", ascending=False).head(25)
qwen  = _keyify(pd.read_csv("reranked_qwen.csv"))
llama = _keyify(pd.read_csv("reranked_llama.csv"))
flan  = _keyify(pd.read_csv("reranked_flan.csv"))

out = (base[["cand_key","A","B","X","additives","pred_T80_h"]]
       .merge(qwen[["cand_key","final_qwen","llm_viability"]].rename(columns={"llm_viability":"llm_viability_qwen"}),  on="cand_key", how="left")
       .merge(llama[["cand_key","final_llama","llm_viability"]].rename(columns={"llm_viability":"llm_viability_llama"}), on="cand_key", how="left")
       .merge(flan[["cand_key","final_flan","llm_viability"]].rename(columns={"llm_viability":"llm_viability_flan"}),   on="cand_key", how="left"))
out.to_csv("leaderboard_top25_merged.csv", index=False)
print("saved leaderboard_top25_merged.csv")

Notes

Tune chemistry windows in generate_and_score.py (within_bounds for t, μ) and --novelty-floor.
Large models can be loaded 4/8-bit in notebooks to fit consumer GPUs.

License

MIT LICENSE.

Summary

Built on our T80 surrogate + cleaned MaterialsZone data, we added literature-grounded RAG+LLM reranking. This repo ships constrained Top-25 generation, JSON scoring/notes, and a sortable dashboard with side-by-side model ranks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG + LLMs for Perovskite Stability (T80) Discovery

Repository

Setup

Data

Quickstart

1) Train the surrogate

2) Generate & score candidates (ABX₃)

3) RAG + LLM reranking (Top-25)

(Optional) Combine per-model results into one Top-25 leaderboard

Notes

License

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Flan_Notebook.ipynb		Flan_Notebook.ipynb
Qwen_Notebook.ipynb		Qwen_Notebook.ipynb
README.md		README.md
Untitled Diagram.jpg		Untitled Diagram.jpg
generate_and_score.py		generate_and_score.py
lLama_Notebook.ipynb		lLama_Notebook.ipynb
train_surrogate_predictor.py		train_surrogate_predictor.py

Folders and files

Latest commit

History

Repository files navigation

RAG + LLMs for Perovskite Stability (T80) Discovery

Repository

Setup

Data

Quickstart

1) Train the surrogate

2) Generate & score candidates (ABX₃)

3) RAG + LLM reranking (Top-25)

(Optional) Combine per-model results into one Top-25 leaderboard

Notes

License

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages