Suppose , are two probability distributions over the same sample space .

Definition.

The Cross Entropy of and is given by


The KL Divergence is related to the Cross Entropy as such: . Because is the empirical distribution while is the model, is effectively a constant and minimizing the KL Divergence or the Cross Entropy is equivalent.