Click here to flash read.
The Kullback-Leibler (KL) divergence is frequently used in data science. For
discrete distributions on large state spaces, approximations of probability
vectors may result in a few small negative entries, rendering the KL divergence
undefined. We address this problem by introducing a parameterized family of
substitute divergence measures, the shifted KL (sKL) divergence measures. Our
approach is generic and does not increase the computational overhead. We show
that the sKL divergence shares important theoretical properties with the KL
divergence and discuss how its shift parameters should be chosen. If Gaussian
noise is added to a probability vector, we prove that the average sKL
divergence converges to the KL divergence for small enough noise. We also show
that our method solves the problem of negative entries in an application from
computational oncology, the optimization of Mutual Hazard Networks for cancer
progression using tensor-train approximations.
No creative common's license