Skip to content
@safety-research

Safety Research

Popular repositories Loading

  1. bloom bloom Public

    bloom - evaluate any behavior immediately  🌸🌱

    Python 1.2k 149

  2. petri petri Public

    An alignment auditing agent capable of quickly exploring alignment hypothesis

    Python 955 141

  3. persona_vectors persona_vectors Public

    Persona Vectors: Monitoring and Controlling Character Traits in Language Models

    Python 377 93

  4. SCONE-bench SCONE-bench Public

    171 29

  5. assistant-axis assistant-axis Public

    The Assistant Axis is a direction in activation space that captures how "Assistant-like" a model's behavior is. Models can drift away from the Assistant during conversations—sometimes toward bizarr…

    Jupyter Notebook 117 29

  6. safety-tooling safety-tooling Public

    Inference API for many LLMs and other useful tools for empirical research

    Python 111 36

Repositories

Showing 10 of 41 repositories

Top languages

Loading…

Most used topics

Loading…