Concept · 5 episode(s)

Circuit Analysis

Definition

Circuit analysis in mechanistic interpretability is the work of identifying small subgraphs of a neural network — specific attention heads and MLP neurons connected in a specific way — that together implement an identifiable algorithm. It treats neural nets less like inscrutable matrices and more like compiled programs to be reverse-engineered.

Episodes covering this

055
Why LLM Judges Flip Their Verdicts When You Change the Question Format
Judge Circuits
Feldhus, Baeumel, Golimblevskaia et al. · Technische Universität Berlin / BIFOLD·26 min·May 19, 2026
038
How LLMs Get Persuaded: One Attention Head, A Tetrahedron, And A Single Dial
How LLMs Are Persuaded: A Few Attention Heads, Rerouted
Sun, Kong, Zhang et al. · Northeastern University·23 min·May 12, 2026
037
Why Hallucination Detectors Miss Stale Facts: A Geometric Story About What Models Know But Don't Say
The Geometry of Forgetting: Temporal Knowledge Drift as an Independent Axis in LLM Representations
Elbadry, Heakl, Zhang et al. · Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)·27 min·May 12, 2026
023
Why a Small Agent Confidently Overwrites Memories It Doesn't Understand
What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis
Mao, Zhao, Penn et al. · City University of Hong Kong·23 min·May 07, 2026
004
The Sycophancy Circuit That Survives Alignment Training
LLMs Know They're Wrong and Agree Anyway: The Shared Sycophancy-Lying Circuit
Pandey · Georgia Institute of Technology·29 min·May 01, 2026

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.