Estimators

We split the estimators into two broad categories, which we call Frequentist and Bayesian. We also have a few composite estimators that either take an averaging or resampling approach to estimation.

estimate_h is parameterised on the type of the estimator. The complete list of types is currently:

  • MaximumLikelihood
  • JackknifeMLE
  • MillerMadow
  • Grassberger
  • ChaoShen
  • Zhang
  • Bonachela
  • Shrink
  • ChaoWangJost
  • Unseen
  • Schurmann
  • SchurmannGeneralised
  • BUB
  • Bayes
  • NSB
  • PYM
  • ANSB
  • Jeffrey
  • Laplace
  • SchurmannGrassberger
  • Minimax
  • PERT
  • WuYang

Frequentist Estimators

DiscreteEntropy.maximum_likelihoodFunction
maximum_likelihood(data::CountData)::Float64

Compute the maximum likelihood estimation of Shannon entropy of data in nats.

\[\hat{H}_{\tiny{ML}} = - \sum_{i=1}^K p_i \log(p_i)\]

or equivalently

\[\hat{H}_{\tiny{ML}} = \log(N) - \frac{1}{N} \sum_{i=1}^{K}h_i \log(h_i)\]

source
DiscreteEntropy.jackknife_mleFunction
jackknife_mle(data::CountData; corrected=false)::Tuple{AbstractFloat, AbstractFloat}

Compute the jackknifed maximum_likelihood estimate of data and the variance of the jackknifing (not the variance of the estimator itself).

If corrected is true, then the variance is scaled with data.N - 1, else it is scaled with data.N. corrected has no effect on the entropy estimation.

External Links

Estimation of the size of a closed population when capture probabilities vary among animals

source
DiscreteEntropy.miller_madowFunction
miller_madow(data::CountData)

Compute the Miller Madow estimation of Shannon entropy, with a positive bias based on the total number of samples seen (N) and the support size (K).

\[\hat{H}_{\tiny{MM}} = \hat{H}_{\tiny{ML}} + \frac{K - 1}{2N}\]

source
DiscreteEntropy.schurmannFunction
 schurmann(data::CountData, ξ::Float64 = ℯ^(-1/2))

Compute the Schurmann estimate of Shannon entropy of data in nats.

\[\hat{H}_{SHU} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi} - 1} \frac{t^{h_i}-1}{1+t}dt \right)\]

There is no ideal value for $\xi$, however the paper suggests $e^{(-1/2)} \approx 0.6$

External Links

schurmann

source
DiscreteEntropy.schurmann_generalisedFunction
schurmann_generalised(data::CountVector, xis::XiVector{T}) where {T<:Real}

\[\hat{H}_{\tiny{SHU}} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi_i} - 1} \frac{t^{h_i}-1}{1+t}dt \right) \]

Compute the generalised Schurmann entropy estimation, given a countvector data and a xivector xis, which must both be the same length.

schurmann_generalised(data::CountVector, xis::Distribution, scalar=false)

Computes the generalised Schurmann entropy estimation, given a countvector data and a vector of xi values.

External Links

schurmann_generalised

source
DiscreteEntropy.bubFunction
 bub(data::CountData; k_max=11, truncate=false, lambda_0=0.0)

Compute The Best Upper Bound (BUB) estimation of Shannon entropy, where

k_max is a degree of freedom parameter. Paninski states that k_max ~ 10 is optimal for most applications. lambda_0 is the Lagrange multiplier on $a_0$ (see paper for details). This can be safely left at 0 for most applications. truncate reduces the number of significant digits in intermediate floating point calculates. This exists to bring the output of this function closer to the original Matlab implementation. Leaving it at false usually results in a slightly higher entropy estimate.

Example

n = [1,2,3,4,5,4,3,2,1]
(h, MM) = bub(from_counts(n))
(2.475817360451392, 0.6542542616181388)

where h is the estimation of Shannon entropy in nats and MM is the upper bound on rms error

External Links

Estimation of Entropy and Mutual Information

source
DiscreteEntropy.chao_shenFunction
chao_shen(data::CountData)

Compute the Chao-Shen estimate of the Shannon entropy of data in nats.

\[\hat{H}_{CS} = - \sum_{i=i}^{K} \frac{\hat{p}_i^{CS} \log \hat{p}_i^{CS}}{1 - (1 - \hat{p}_i^{CS})}\]

where

\[\hat{p}_i^{CS} = (1 - \frac{1 - \hat{p}_i^{ML}}{N}) \hat{p}_i^{ML}\]

source
DiscreteEntropy.shrinkFunction
shrink(data::CountData)

Compute the Shrinkage, or James-Stein estimator of Shannon entropy for data in nats.

\[\hat{H}_{\tiny{SHR}} = - \sum_{i=1}^{K} \hat{p}_x^{\tiny{SHR}} \log(\hat{p}_x^{\tiny{SHR}})\]

where

\[\hat{p}_x^{\tiny{SHR}} = \lambda t_x + (1 - \lambda) \hat{p}_x^{\tiny{ML}}\]

and

\[\lambda = \frac{ 1 - \sum_{x=1}^{K} (\hat{p}_x^{\tiny{SHR}})^2}{(n-1) \sum_{x=1}^K (t_x - \hat{p}_x^{\tiny{ML}})^2}\]

with

\[t_x = 1 / K\]

Notes

Based on the implementation in the R package entropy

External Links

Entropy Inference and the James-Stein Estimator

source
DiscreteEntropy.chao_wang_jostFunction
chao_wang_jost(data::CountData)

Compute the Chao Wang Jost Shannon entropy estimate of data in nats.

\[\hat{H}_{\tiny{CWJ}} = \sum_{1 \leq h_i \leq N-1} \frac{h_i}{N} \left(\sum_{k=h_i}^{N-1} \frac{1}{k} \right) + \frac{f_1}{N} (1 - A)^{-N + 1} \left\{ - \log(A) - \sum_{r=1}^{N-1} \frac{1}{r} (1 - A)^r \right\}\]

with

\[A = \begin{cases} \frac{2 f_2}{(N-1) f_1 + 2 f_2} \, & \text{if} \, f_2 > 0 \\ \frac{2}{(N-1)(f_1 - 1) + 1} \, & \text{if} \, f_2 = 0, \; f_1 \neq 0 \\ 1, & \text{if} \, f_1 = f_2 = 0 \end{cases}\]

where $f_1$ is the number of singletons and $f_2$ the number of doubletons in data.

Notes

The algorithm is slightly modified port of that used in the entropart R library.

External Links

Entropy and the species accumulation curve

source

Bayesian Estimators

DiscreteEntropy.bayesFunction
bayes(data::CountData, α::AbstractFloat; K=nothing)

Compute an estimate of Shannon entropy given data and a concentration parameter $α$. If K is not provided, then the observed support size in data is used.

\[\hat{H}_{\text{Bayes}} = - \sum_{k=1}^{K} \hat{p}_k^{\text{Bayes}} \; \log \hat{p}_k^{\text{Bayes}}\]

where

\[p_k^{\text{Bayes}} = \frac{K + α}{n + A}\]

and

\[A = \sum_{x=1}^{K} α_{x}\]

In addition to setting your own α, we have the following suggested choices

  1. jeffrey : α = 0.5
  2. laplace: α = 1.0
  3. schurmann_grassberger: α = 1 / K
  4. minimax: α = √{n} / K
source
DiscreteEntropy.minimaxFunction
 minimax(data::CountData; K=nothing)

Compute bayes estimate of entropy, with $α = √\frac{data.N}{K}$ where K = data.K if K is nothing.

source
DiscreteEntropy.nsbFunction
nsb(data::CountData, K=data.K; verbose=false)

Returns the Bayesian estimate of Shannon entropy of data, using the Nemenman, Shafee, Bialek algorithm

\[\hat{H}^{\text{NSB}} = \frac{ \int_0^{\ln(K)} d\xi \, \rho(\xi, \textbf{n}) \langle H^m \rangle_{\beta (\xi)} } { \int_0^{\ln(K)} d\xi \, \rho(\xi\mid n)}\]

where

\[\rho(\xi \mid \textbf{n}) = \mathcal{P}(\beta (\xi)) \frac{ \Gamma(\kappa(\xi))}{\Gamma(N + \kappa(\xi))} \prod_{i=1}^K \frac{\Gamma(n_i + \beta(\xi))}{\Gamma(\beta(\xi))}\]

If there are no coincidences in the data, NSB returns NaN. If verbose is true, NSB will warn you of errors.

source
DiscreteEntropy.ansbFunction
ansb(data::CountData; undersampled::Float64=0.1)::Float64

Return the Asymptotic NSB estimation of the Shannon entropy of data in nats.

See Asymptotic NSB estimator (equations 11 and 12)

\[\hat{H}_{\tiny{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)\]

where $C_\gamma$ is Euler's Gamma ($\approx 0.57721...$), $\psi_0$ is the digamma function and $\Delta$ the number of coincidences in the data.

This is designed for the extremely undersampled regime (K ~ N) and diverges with N when well-sampled. ANSB requires that $N/K → 0$, which we set to be $N/K < 0.1$ by default in undersampled. You can, of course, experiment with this value, but the behaviour might be unpredictable.

If there are no coincidences in the data, ANSB returns NaN

External Links

Asymptotic NSB estimator (equations 11 and 12)

source

Other Estimators

DiscreteEntropy.wu_yang_polyFunction

wuyangpoly(data::CountData; L::Int=0, M::Float64=0.0, N::Int=0)

Compute the Wu Yang Polynomial wu_yang_poly estimate of data. This implementation uses the precomputed coefficients found here

Optional Parameters

L::Int : polynomial degree, default = floor(1.6 * log(data.K)) M::Float64 : endpoint of approximation interval, default = 3.5 * log(data.K) N::Int : threshold for polynomial estimator application, default = floor(1.6 * log(data.K))

External Links

Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation

source
DiscreteEntropy.pertFunction
pert(data::CountData, estimator::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, b::Type{T}, c::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, a::Type{T}, b::Type{T1}, c::Type{T2}) where {T,T1,T2<:AbstractEstimator}

A Pert estimate of entropy, where

a = best estimate b = most likely estimate c = worst case estimate

\[H = \frac{a + 4b + c}{6}\]

where the default estimators are: $a$ = MaximumLikelihood, $c$ = ANSB and $b$ is the most likely value = ChaoShen. The user can, of course, specify any combination of estimators they want.

source
DiscreteEntropy.jackknifeFunction
 jackknife(data::CountData, estimator::Type{T}; corrected=false) where {T<:AbstractEstimator}

Compute the jackknifed estimate of estimator on data.

If corrected is true, then the variance is scaled with data.N - 1, else it is scaled with data.N. corrected has no effect on the entropy estimation.

source
DiscreteEntropy.bayesian_bootstrapFunction
 bayesian_bootstrap(samples::SampleVector, estimator::Type{T}, reps, seed, concentration) where {T<:AbstractEstimator}

Compute a bayesian bootstrap resampling of samples for estimation with estimator, where reps is number of resampling to perform, seed is the random seed and concentration is the concentration parameter for a Dirichlet distribution.

External Links

The Bayesian Bootstrap

source

Types

Estimator types for developers. Estimators are either parameterised, or non-parameterised.