Estimators

We split the estimators into two broad categories, which we call Frequentist and Bayesian. We also have a few composite estimators that either take an averaging or resampling approach to estimation.

estimate_h is parameterised on the type of the estimator. The complete list of types is currently:

MaximumLikelihood
JackknifeMLE
MillerMadow
Grassberger
ChaoShen
Zhang
Bonachela
Shrink
ChaoWangJost
Unseen
Schurmann
SchurmannGeneralised
BUB
Bayes
NSB
PYM
ANSB
Jeffrey
Laplace
SchurmannGrassberger
Minimax
PERT
WuYang

Frequentist Estimators

DiscreteEntropy.maximum_likelihood — Function

maximum_likelihood(data::CountData)::Float64

Compute the maximum likelihood estimation of Shannon entropy of data in nats.

\[\hat{H}_{\tiny{ML}} = - \sum_{i=1}^K p_i \log(p_i)\]

or equivalently

\[\hat{H}_{\tiny{ML}} = \log(N) - \frac{1}{N} \sum_{i=1}^{K}h_i \log(h_i)\]

source

DiscreteEntropy.jackknife_mle — Function

jackknife_mle(data::CountData; corrected=false)::Tuple{AbstractFloat, AbstractFloat}

Compute the jackknifed maximum_likelihood estimate of data and the variance of the jackknifing (not the variance of the estimator itself).

If corrected is true, then the variance is scaled with data.N - 1, else it is scaled with data.N. corrected has no effect on the entropy estimation.

External Links

Estimation of the size of a closed population when capture probabilities vary among animals

source

DiscreteEntropy.miller_madow — Function

miller_madow(data::CountData)

Compute the Miller Madow estimation of Shannon entropy, with a positive bias based on the total number of samples seen (N) and the support size (K).

\[\hat{H}_{\tiny{MM}} = \hat{H}_{\tiny{ML}} + \frac{K - 1}{2N}\]

source

DiscreteEntropy.grassberger — Function

grassberger(data::CountData)

Compute the Grassberger (1988) estimation of Shannon entropy of data in nats

\[\hat{H}_{\tiny{Gr88}} = \sum_i \frac{h_i}{H} \left(\log(N) - \psi(h_i) - \frac{(-1)^{h_i}}{n_i + 1} \right)\]

Equation 13 from Finite sample corrections to entropy and dimension estimate

source

DiscreteEntropy.schurmann — Function

 schurmann(data::CountData, ξ::Float64 = ℯ^(-1/2))

Compute the Schurmann estimate of Shannon entropy of data in nats.

\[\hat{H}_{SHU} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi} - 1} \frac{t^{h_i}-1}{1+t}dt \right)\]

There is no ideal value for $\xi$, however the paper suggests $e^{(-1/2)} \approx 0.6$

External Links

schurmann

source

DiscreteEntropy.schurmann_generalised — Function

schurmann_generalised(data::CountVector, xis::XiVector{T}) where {T<:Real}

\[\hat{H}_{\tiny{SHU}} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi_i} - 1} \frac{t^{h_i}-1}{1+t}dt \right) \]

Compute the generalised Schurmann entropy estimation, given a countvector data and a xivector xis, which must both be the same length.

schurmann_generalised(data::CountVector, xis::Distribution, scalar=false)

Computes the generalised Schurmann entropy estimation, given a countvector data and a vector of xi values.

External Links

schurmann_generalised

source

DiscreteEntropy.bub — Function

 bub(data::CountData; k_max=11, truncate=false, lambda_0=0.0)

Compute The Best Upper Bound (BUB) estimation of Shannon entropy, where

k_max is a degree of freedom parameter. Paninski states that k_max ~ 10 is optimal for most applications. lambda_0 is the Lagrange multiplier on $a_0$ (see paper for details). This can be safely left at 0 for most applications. truncate reduces the number of significant digits in intermediate floating point calculates. This exists to bring the output of this function closer to the original Matlab implementation. Leaving it at false usually results in a slightly higher entropy estimate.

Example

n = [1,2,3,4,5,4,3,2,1]
(h, MM) = bub(from_counts(n))
(2.475817360451392, 0.6542542616181388)

where h is the estimation of Shannon entropy in nats and MM is the upper bound on rms error

External Links

Estimation of Entropy and Mutual Information

source

DiscreteEntropy.chao_shen — Function

chao_shen(data::CountData)

Compute the Chao-Shen estimate of the Shannon entropy of data in nats.

\[\hat{H}_{CS} = - \sum_{i=i}^{K} \frac{\hat{p}_i^{CS} \log \hat{p}_i^{CS}}{1 - (1 - \hat{p}_i^{CS})}\]

where

\[\hat{p}_i^{CS} = (1 - \frac{1 - \hat{p}_i^{ML}}{N}) \hat{p}_i^{ML}\]

source

DiscreteEntropy.zhang — Function

zhang(data::CountData)

Compute the Zhang estimate of the Shannon entropy of data in nats.

The recommended definition of Zhang's estimator is from Grabchak et al.

\[\hat{H}_Z = \sum_{i=1}^K \hat{p}_i \sum_{v=1}^{N - h_i} \frac{1}{v} ∏_{j=0}^{v-1} \left( 1 + \frac{1 - h_i}{N - 1 - j} \right)\]

The actual algorithm comes from Fast Calculation of entropy with Zhang's estimator by Lozano et al..

Exernal Links

Entropy estimation in turing's perspective

source

DiscreteEntropy.bonachela — Function

bonachela(data::CountData)

Compute the Bonachela estimator of the Shannon entropy of data in nats.

\[\hat{H}_{B} = \frac{1}{N+2} \sum_{i=1}^{K} \left( (h_i + 1) \sum_{j=n_i + 2}^{N+2} \frac{1}{j} \right)\]

External Links

Entropy estimates of small data sets

source

DiscreteEntropy.shrink — Function

shrink(data::CountData)

Compute the Shrinkage, or James-Stein estimator of Shannon entropy for data in nats.

\[\hat{H}_{\tiny{SHR}} = - \sum_{i=1}^{K} \hat{p}_x^{\tiny{SHR}} \log(\hat{p}_x^{\tiny{SHR}})\]

where

\[\hat{p}_x^{\tiny{SHR}} = \lambda t_x + (1 - \lambda) \hat{p}_x^{\tiny{ML}}\]

and

\[\lambda = \frac{ 1 - \sum_{x=1}^{K} (\hat{p}_x^{\tiny{SHR}})^2}{(n-1) \sum_{x=1}^K (t_x - \hat{p}_x^{\tiny{ML}})^2}\]

with

\[t_x = 1 / K\]

Notes

Based on the implementation in the R package entropy

External Links

Entropy Inference and the James-Stein Estimator

source

DiscreteEntropy.chao_wang_jost — Function

chao_wang_jost(data::CountData)

Compute the Chao Wang Jost Shannon entropy estimate of data in nats.

\[\hat{H}_{\tiny{CWJ}} = \sum_{1 \leq h_i \leq N-1} \frac{h_i}{N} \left(\sum_{k=h_i}^{N-1} \frac{1}{k} \right) + \frac{f_1}{N} (1 - A)^{-N + 1} \left\{ - \log(A) - \sum_{r=1}^{N-1} \frac{1}{r} (1 - A)^r \right\}\]

with

\[A = \begin{cases} \frac{2 f_2}{(N-1) f_1 + 2 f_2} \, & \text{if} \, f_2 > 0 \\ \frac{2}{(N-1)(f_1 - 1) + 1} \, & \text{if} \, f_2 = 0, \; f_1 \neq 0 \\ 1, & \text{if} \, f_1 = f_2 = 0 \end{cases}\]

where $f_1$ is the number of singletons and $f_2$ the number of doubletons in data.

Notes

The algorithm is slightly modified port of that used in the entropart R library.

External Links

Entropy and the species accumulation curve

source

DiscreteEntropy.unseen — Function

unseen(data::CountData)

Compute the Unseen estimatation of Shannon entropy.

Example

n = [1,2,3,4,5,4,3,2,1]
h = unseen(from_counts(n))
1.4748194918254784

External Links

Estimating the Unseen: Improved Estimators for Entropy and Other Properties

source

Bayesian Estimators

DiscreteEntropy.bayes — Function

bayes(data::CountData, α::AbstractFloat; K=nothing)

Compute an estimate of Shannon entropy given data and a concentration parameter $α$. If K is not provided, then the observed support size in data is used.

\[\hat{H}_{\text{Bayes}} = - \sum_{k=1}^{K} \hat{p}_k^{\text{Bayes}} \; \log \hat{p}_k^{\text{Bayes}}\]

where

\[p_k^{\text{Bayes}} = \frac{K + α}{n + A}\]

and

\[A = \sum_{x=1}^{K} α_{x}\]

In addition to setting your own α, we have the following suggested choices

jeffrey : α = 0.5
laplace: α = 1.0
schurmann_grassberger: α = 1 / K
minimax: α = √{n} / K

source

DiscreteEntropy.jeffrey — Function

 jeffrey(data::CountData; K=nothing)

Compute bayes estimate of entropy, with $α = 0.5$

source

DiscreteEntropy.laplace — Function

 laplace(data::CountData; K=nothing)

Compute bayes estimate of entropy, with $α = 1.0$

source

DiscreteEntropy.schurmann_grassberger — Function

 schurmann_grassberger(data::CountData; K=nothing)

Compute bayes estimate of entropy, with $α = \frac{1}{K}$. If K is nothing, then use data.K

source

DiscreteEntropy.minimax — Function

 minimax(data::CountData; K=nothing)

Compute bayes estimate of entropy, with $α = √\frac{data.N}{K}$ where K = data.K if K is nothing.

source

DiscreteEntropy.nsb — Function

nsb(data::CountData, K=data.K; verbose=false)

Returns the Bayesian estimate of Shannon entropy of data, using the Nemenman, Shafee, Bialek algorithm

\[\hat{H}^{\text{NSB}} = \frac{ \int_0^{\ln(K)} d\xi \, \rho(\xi, \textbf{n}) \langle H^m \rangle_{\beta (\xi)} } { \int_0^{\ln(K)} d\xi \, \rho(\xi\mid n)}\]

where

\[\rho(\xi \mid \textbf{n}) = \mathcal{P}(\beta (\xi)) \frac{ \Gamma(\kappa(\xi))}{\Gamma(N + \kappa(\xi))} \prod_{i=1}^K \frac{\Gamma(n_i + \beta(\xi))}{\Gamma(\beta(\xi))}\]

If there are no coincidences in the data, NSB returns NaN. If verbose is true, NSB will warn you of errors.

source

DiscreteEntropy.ansb — Function

ansb(data::CountData; undersampled::Float64=0.1)::Float64

Return the Asymptotic NSB estimation of the Shannon entropy of data in nats.

See Asymptotic NSB estimator (equations 11 and 12)

\[\hat{H}_{\tiny{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)\]

where $C_\gamma$ is Euler's Gamma ($\approx 0.57721...$), $\psi_0$ is the digamma function and $\Delta$ the number of coincidences in the data.

This is designed for the extremely undersampled regime (K ~ N) and diverges with N when well-sampled. ANSB requires that $N/K → 0$, which we set to be $N/K < 0.1$ by default in undersampled. You can, of course, experiment with this value, but the behaviour might be unpredictable.

If there are no coincidences in the data, ANSB returns NaN

External Links

Asymptotic NSB estimator (equations 11 and 12)

source

DiscreteEntropy.pym — Function

pym(data::CountData)::Float64

A more or less faithful port of the original matlab code to Julia

source

Other Estimators

DiscreteEntropy.wu_yang_poly — Function

wuyangpoly(data::CountData; L::Int=0, M::Float64=0.0, N::Int=0)

Compute the Wu Yang Polynomial wu_yang_poly estimate of data. This implementation uses the precomputed coefficients found here

Optional Parameters

L::Int : polynomial degree, default = floor(1.6 * log(data.K)) M::Float64 : endpoint of approximation interval, default = 3.5 * log(data.K) N::Int : threshold for polynomial estimator application, default = floor(1.6 * log(data.K))

External Links

Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation

source

DiscreteEntropy.pert — Function

pert(data::CountData, estimator::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, b::Type{T}, c::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, a::Type{T}, b::Type{T1}, c::Type{T2}) where {T,T1,T2<:AbstractEstimator}

A Pert estimate of entropy, where

a = best estimate b = most likely estimate c = worst case estimate

\[H = \frac{a + 4b + c}{6}\]

where the default estimators are: $a$ = MaximumLikelihood, $c$ = ANSB and $b$ is the most likely value = ChaoShen. The user can, of course, specify any combination of estimators they want.

source

DiscreteEntropy.jackknife — Function

 jackknife(data::CountData, estimator::Type{T}; corrected=false) where {T<:AbstractEstimator}

Compute the jackknifed estimate of estimator on data.

If corrected is true, then the variance is scaled with data.N - 1, else it is scaled with data.N. corrected has no effect on the entropy estimation.

source

DiscreteEntropy.bayesian_bootstrap — Function

 bayesian_bootstrap(samples::SampleVector, estimator::Type{T}, reps, seed, concentration) where {T<:AbstractEstimator}

Compute a bayesian bootstrap resampling of samples for estimation with estimator, where reps is number of resampling to perform, seed is the random seed and concentration is the concentration parameter for a Dirichlet distribution.

External Links

The Bayesian Bootstrap

source