Definition.
For two probability distributions and on a common sample space, the Kullback-Leibler (KL) Divergence is defined as:
Theorem. (Gibbs’ Inequality)
Let , be two discrete probability distributions on a discrete sample space . Then, , with equality if and only if .
Gibbs’ Inequality allows us to think of KL Divergence as a notion of “distance” between probability spaces. Note that KL Divergence isn’t an actual metric in the traditional mathematical sense. It is not symmetric and does not satisfy the triangle inequality.