Concept · 5 episode(s)

Circuit Analysis

← all concepts

Definition

Circuit analysis in mechanistic interpretability is the work of identifying small subgraphs of a neural network — specific attention heads and MLP neurons connected in a specific way — that together implement an identifiable algorithm. It treats neural nets less like inscrutable matrices and more like compiled programs to be reverse-engineered.

Episodes covering this

Worth reading next

Papers we haven't done a deep dive on yet, but would recommend on this topic.