A monorepo of AI safety experiments: sparse autoencoders on gelu-2l with activation steering, explainable AI on Kickstarter data using SHAP and LIME, and activation oracles on Qwen2.5-0.5B.
transformers pytorch xgboost lora sparse-autoencoders ai-safety lime explainable-ai mlops mlflow shap mlflow-tracking mechanistic-interpretability qwen activation-steering transformer-lens ai-safety-research sae-lens gelu-2l activation-oracles
-
Updated
May 13, 2026 - Python