Generate novel ABX₃ perovskite candidates with longer shelf-life, pre-rank them with a T80 surrogate model, then rerank the Top-25 with RAG-assisted LLMs (Qwen, Llama, Flan) using evidence from the literature.
Why this? Direct LLM finetuning on raw tabular data led to hallucinations. Using LLMs for what they do best—reading and synthesizing papers—yields consistent, defensible rerankings when paired with a transparent surrogate.
├─ train_surrogate_predictor.py # train T80 surrogate & feature space
├─ generate_and_score.py # sample/validate ABX3, predict T80, rank & export
├─ Qwen_Notebook.ipynb # RAG + JSON scoring for Qwen
├─ lLama_Notebook.ipynb # RAG + JSON scoring for Llama
├─ Flan_Notebook.ipynb # RAG + JSON scoring for Flan
└─ Untitled Diagram.jpg # schematic
# Python 3.10+ recommended
pip install -U numpy pandas scikit-learn xgboost joblib tqdm
# Notebooks may also need: transformers sentence-transformers chromadbGPU is optional; it helps for embeddings/LLM steps in notebooks.
rows_with_T80.csv— cleaned historical dataset with T80- A folder of papers (PDF/HTML) for retrieval (used by notebooks)
python train_surrogate_predictor.py \
--data rows_with_T80.csv \
--out t80_surrogate_xgb.joblib \
--feature-space feature_space.jsonOutputs
t80_surrogate_xgb.joblib— trained model pipelinefeature_space.json— allowed ions/tokens + radii map (for chemistry checks)
python generate_and_score.py \
--goal 1000 \
--allow-pb 1 \
--n 8000 \
--workers 6 \
--model t80_surrogate_xgb.joblib \
--feature-space feature_space.json \
--existing rows_with_T80.csvWhat happens
- Samples A/B/X within feature space, enforces charge neutrality & tolerance factors (
t,μ) - Predicts log10(T80) via surrogate; computes novelty vs historical tokens
- Ranking:
meets_goal→pred_T80_h→novelty
Output
top_candidates.csv(up to top 500)
Open and run these notebooks (set top_candidates.csv path at the top):
Qwen_Notebook.ipynblLama_Notebook.ipynbFlan_Notebook.ipynb
Each notebook:
- Builds/uses a small RAG index of papers (Nature, Joule, NREL, …).
- For Top-25 candidates, retrieves evidence and prompts the model to return JSON:
viability_score,consistency_score,risks,notes,cites. - Saves per-model CSVs and an optional HTML leaderboard (side-by-side model Top-10 + sortable Top-25).
Typical outputs
reranked_qwen.csv,reranked_llama.csv,reranked_flan.csvleaderboard_demo_side_by_side.html
import json, pandas as pd
def _keyify(df):
def canon(x):
return json.dumps(json.loads(x), sort_keys=True) if isinstance(x, str) else json.dumps(x, sort_keys=True)
def k(r): return "|".join([canon(r["A"]), canon(r["B"]), canon(r["X"]), str(r.get("additives",""))])
if "cand_key" not in df.columns:
df = df.copy(); df["cand_key"] = df.apply(k, axis=1)
return df
base = _keyify(pd.read_csv("top_candidates.csv")).sort_values("pred_T80_h", ascending=False).head(25)
qwen = _keyify(pd.read_csv("reranked_qwen.csv"))
llama = _keyify(pd.read_csv("reranked_llama.csv"))
flan = _keyify(pd.read_csv("reranked_flan.csv"))
out = (base[["cand_key","A","B","X","additives","pred_T80_h"]]
.merge(qwen[["cand_key","final_qwen","llm_viability"]].rename(columns={"llm_viability":"llm_viability_qwen"}), on="cand_key", how="left")
.merge(llama[["cand_key","final_llama","llm_viability"]].rename(columns={"llm_viability":"llm_viability_llama"}), on="cand_key", how="left")
.merge(flan[["cand_key","final_flan","llm_viability"]].rename(columns={"llm_viability":"llm_viability_flan"}), on="cand_key", how="left"))
out.to_csv("leaderboard_top25_merged.csv", index=False)
print("saved leaderboard_top25_merged.csv")- Tune chemistry windows in
generate_and_score.py(within_boundsfort,μ) and--novelty-floor. - Large models can be loaded 4/8-bit in notebooks to fit consumer GPUs.
MIT LICENSE.
Built on our T80 surrogate + cleaned MaterialsZone data, we added literature-grounded RAG+LLM reranking. This repo ships constrained Top-25 generation, JSON scoring/notes, and a sortable dashboard with side-by-side model ranks.
