Definition
Inference cost is the price — in dollars, watts, or latency — of running a model once it’s trained. At scale it dwarfs training cost, and it’s often the binding constraint on which features can actually ship.
Episodes covering this
Worth reading next
Papers we haven't done a deep dive on yet, but would recommend on this topic.