Divergence and Distance

DiscreteEntropy.cross_entropy — Function

 cross_entropy(P::CountVector, Q::CountVector, ::Type{T}) where {T<:AbstractEstimator}

\[H(P,Q) = - \sum_x(P(x) \log(Q(x)))\]

Compute the cross entropy of $P$ and $Q$, given an estimator of type $T$. $P$ and $Q$ must be the same length. Both vectors are normalised. The cross entropy of a probability distribution $P$ with itself, is equal to its entropy, ie $H(P, P) = H(P)$.

Example


julia> P = cvector([1,2,3,4,3,2])
julia> Q = cvector([2,5,5,4,3,4])

julia> ce = cross_entropy(P, Q, MaximumLikelihood)
1.778564897565542

Note: not every estimator is currently supported.

source

DiscreteEntropy.kl_divergence — Function

kl_divergence(P::CountVector, Q::CountVector, estimator::Type{T}; truncate::Union{Nothing, Int} = nothing) where {T<:AbstractEstimator}

\[D_{KL}(P ‖ Q) = \sum_{x \in X} P(x) \log \left( \frac{P(x)}{Q(x)} \right)\]

Compute the Kullback-Lebler Divergence between two discrete distributions. $P$ and $Q$ must be the same length. If the distributions are not normalised, they will be.

If the distributions are not over the same space or the cross entropy is negative, then it returns Inf.

If truncate is set to some integer value, x, return kl_divergence rounded to x decimal places.

source

DiscreteEntropy.jensen_shannon_divergence — Function

jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector)
jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector, estimator::Type{T}) where {T<:NonParamterisedEstimator}
jensen_shannon_divergence(countsP::CountVector, countsQ::CountVector, estimator::Type{Bayes}, α)

Compute the Jensen Shannon Divergence between discrete distributions $P$ and $Q$, as represented by their histograms. If no estimator is specified, it defaults to MaximumLikelihood.

\[\widehat{JS}(P, Q) = \hat{H}\left(\frac{P + Q}{2} \right) - \left( \frac{H(P) + H(Q)}{2} \right) \]

source

DiscreteEntropy.jensen_shannon_distance — Function

jensen_shannon_distance(P::CountVector, Q::CountVector, estimator::Type{T}) where {T<:AbstractEstimator}

Compute the Jensen Shannon Distance

source

DiscreteEntropy.jeffreys_divergence — Function

jeffreys_divergence(P::CountVector, Q::CountVector)
jeffreys_divergence(P::CountVector, Q::CountVector, estimator::Type{T}) where T<:AbstractEstimator

\[J(p, q) = D_{KL}(p \Vert q) + D_{KL}(q \Vert p)\]

If no estimator is specified, then we calculate using maximum likelihood

External Links

Paper

source

DiscreteEntropy.uncertainty_coefficient — Function

 uncertainty_coefficient(joint::Matrix{I}, estimator::Type{T}; symmetric=false) where {T<:AbstractEstimator, I<:Real}

Compute Thiel's uncertainty coefficient on 2 dimensional matrix joint, with estimator, where joint is the histogram of the joint distribution of two random variables $(X;Y)$, and $I(X;Y)$ is the (estimated) mutual information.

\[U(X \mid Y) = \frac{I(X;Y)}{H(X)}\]

If symmetric is true then compute the weighted average between X and Y

\[U(X, Y) = 2 \left[ \frac{H(X) + H(Y) - H(X, Y)} {H(X) + H(Y)} \right]\]

source