Skip to content
@aisilab

AI Safety & Interpretability Lab

Popular repositories Loading

  1. arbiter arbiter Public

    Run HuggingFace models through freeform questions and judge responses with an LLM.

    Python 4 1

  2. psychological-safety psychological-safety Public

    Python 2

  3. diffing-toolkit diffing-toolkit Public

    Forked from science-of-finetuning/diffing-toolkit

    A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.

    Python 1 1

  4. aisilab.github.io aisilab.github.io Public

    Website of the AI Safety & Interpretability Lab at SDU

    HTML 1

  5. Superadditive-cooperation-LLMs Superadditive-cooperation-LLMs Public

    Forked from pippot/Superadditive-cooperation-LLMs

    Study on super additive cooperation between Large Language Model agents in an Iterated Prisoner's Dilemma tournament

    Python

  6. Prolog-as-a-Tool Prolog-as-a-Tool Public

    Forked from niklasmellgren/grpo-prolog-inference

    Reinforcement fine-tuning LLMs with GRPO to generate Prolog code for symbolic reasoning and inference

    Python

Repositories

Showing 8 of 8 repositories

Top languages

Loading…

Most used topics

Loading…