Definition
Circuit analysis in mechanistic interpretability is the work of identifying small subgraphs of a neural network — specific attention heads and MLP neurons connected in a specific way — that together implement an identifiable algorithm. It treats neural nets less like inscrutable matrices and more like compiled programs to be reverse-engineered.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.