👟 SUP: Sycophancy Under Pressure
-
Updated
Jan 11, 2026 - Python
👟 SUP: Sycophancy Under Pressure
LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.
Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via confidence probes.
This repo shows the coding of sycophancy in LLMs as Bayesian-Latent model
Adversarial testing of LLMs on constraint satisfaction deadlocks
Rigorous framework for evaluating AI alignment properties — sycophancy, corrigibility, deception, goal stability, and power-seeking — with statistical confidence intervals
Systematic probing toolkit for alignment-relevant LLM behaviors: sycophancy, sandbagging, power-seeking, deceptive alignment, and corrigibility failures
Official code for "From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems" (IWSDS 2026)
Prompt prevents contextual bleed between active discussion and meta object being discussed via conceptual scaffolding.
A deterministic safety layer for probabilistic AI systems — preventing delusion reinforcement and AI-induced psychological harm through immutable governance
Add a description, image, and links to the sycophancy topic page so that developers can more easily learn about it.
To associate your repository with the sycophancy topic, visit your repo's landing page and select "manage topics."