Divergence and Distance

DiscreteEntropy.cross_entropyFunction
 cross_entropy(P::CountVector, Q::CountVector, ::Type{T}) where {T<:AbstractEstimator}

\[H(P,Q) = - \sum_x(P(x) \log(Q(x)))\]

Compute the cross entropy of $P$ and $Q$, given an estimator of type $T$. $P$ and $Q$ must be the same length. Both vectors are normalised. The cross entropy of a probability distribution $P$ with itself, is equal to its entropy, ie $H(P, P) = H(P)$.

Example


julia> P = cvector([1,2,3,4,3,2])
julia> Q = cvector([2,5,5,4,3,4])

julia> ce = cross_entropy(P, Q, MaximumLikelihood)
1.778564897565542

Note: not every estimator is currently supported.

source
DiscreteEntropy.kl_divergenceFunction
kl_divergence(P::CountVector, Q::CountVector, estimator::Type{T}; truncate::Union{Nothing, Int} = nothing) where {T<:AbstractEstimator}

\[D_{KL}(P ‖ Q) = \sum_{x \in X} P(x) \log \left( \frac{P(x)}{Q(x)} \right)\]

Compute the Kullback-Lebler Divergence between two discrete distributions. $P$ and $Q$ must be the same length. If the distributions are not normalised, they will be.

If the distributions are not over the same space or the cross entropy is negative, then it returns Inf.

If truncate is set to some integer value, x, return kl_divergence rounded to x decimal places.

source
DiscreteEntropy.jensen_shannon_divergenceFunction
jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector)
jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector, estimator::Type{T}) where {T<:NonParamterisedEstimator}
jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector, estimator::Type{Bayes}, α)

Compute the Jensen Shannon Divergence between discrete distributions $P$ and $Q$, as represented by their histograms. If no estimator is specified, it defaults to MaximumLikelihood.

\[\widehat{JS}(P, Q) = \hat{H}\left(\frac{P + Q}{2} \right) - \left( \frac{H(P) + H(Q)}{2} \right) \]

source
DiscreteEntropy.jeffreys_divergenceFunction
jeffreys_divergence(P::CountVector, Q::CountVector)
jeffreys_divergence(P::CountVector, Q::CountVector, estimator::Type{T}) where T<:AbstractEstimator

\[J(p, q) = D_{KL}(p \Vert q) + D_{KL}(q \Vert p)\]

If no estimator is specified, then we calculate using maximum likelihood

External Links

Paper

source
DiscreteEntropy.uncertainty_coefficientFunction
 uncertainty_coefficient(joint::Matrix{I}, estimator::Type{T}; symmetric=false) where {T<:AbstractEstimator, I<:Real}

Compute Thiel's uncertainty coefficient on 2 dimensional matrix joint, with estimator, where joint is the histogram of the joint distribution of two random variables $(X;Y)$, and $I(X;Y)$ is the (estimated) mutual information.

\[U(X \mid Y) = \frac{I(X;Y)}{H(X)}\]

If symmetric is true then compute the weighted average between X and Y

\[U(X, Y) = 2 \left[ \frac{H(X) + H(Y) - H(X, Y)} {H(X) + H(Y)} \right]\]

source