Exploring AI behavior through evaluation, red teaming, and cultural benchmarking.
Background: Google Cybersecurity certified | Lakera Gandalf AI adversarial testing (2nd place in model league)
- 🔬 Competing Circuits Across Languages — Safety vs. Instruction-Following Dynamics in Multilingual LLMs (WIP)
- Mechanistic Interpretability
- Multilingual AI Safety
- Red Teaming