Definition
Studying the inner workings of AI models the way you'd study circuits, to figure out what each part does.
A research area focused on reverse-engineering specific computations and circuits inside neural networks rather than only describing input-output behavior.
Also called: mechanistic