Definition.

For two probability distributions and on a common sample space, the Kullback-Leibler (KL) Divergence is defined as:

Theorem. (Gibbs’ Inequality)

Let , be two discrete probability distributions on a discrete sample space . Then, , with equality if and only if .


Gibbs’ Inequality allows us to think of KL Divergence as a notion of “distance” between probability spaces. Note that KL Divergence isn’t an actual metric in the traditional mathematical sense. It is not symmetric and does not satisfy the triangle inequality.