Concept · 9 episode(s)

Linear Representation

Definition

The linear representation hypothesis is the conjecture that meaningful concepts in neural networks live along approximately linear directions in activation space — that “truth,” “refusal,” or “Spanish” is a vector you can find and steer along. Activation steering and most of mechanistic interpretability lean heavily on this being mostly true.

Episodes covering this

204
The Length Estimate Hiding Inside a Word-by-Word Model
How Much is Left? LLMs Linearly Encode Their Remaining Output Length
· ·14 min·Jul 07, 2026
171
The Safety Decision a Model Makes Before It Thinks a Word
Do Thinking Tokens Help with Safety?
Ri, Panigrahi, Arora · Princeton Language and Intelligence·25 min·Jun 25, 2026
153
Catching a Lie From the Inside, When the Words Look Completely Honest
Rift: A Conflict Signature for Deception in Language Models
Nyoma · Harmonic Labs·26 min·Jun 18, 2026
098
Finding Millions of Readable Concepts Inside a Real, Deployed AI Model
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
Templeton, Conerly, Marcus et al. · Anthropic·28 min·May 29, 2026
055
Why LLM Judges Flip Their Verdicts When You Change the Question Format
Judge Circuits
Feldhus, Baeumel, Golimblevskaia et al. · Technische Universität Berlin / BIFOLD·26 min·May 19, 2026
040
Two Frozen Models Learn to Whisper: Coupling Through Hidden States
The Bicameral Model: Bidirectional Hidden-State Coupling Between Parallel Language Models
Flamant, Ghai, Shimizu · AWS Agentic AI·29 min·May 13, 2026
038
How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Sun, Kong, Zhang et al. · Northeastern University·23 min·May 12, 2026
037
Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't Say
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
Elbadry, Heakl, Zhang et al. · Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)·27 min·May 12, 2026
004
The Sycophancy Circuit That Survives Alignment Training
LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit
Pandey · Georgia Institute of Technology·29 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.