Estimators
We split the estimators into two broad categories, which we call Frequentist and Bayesian. We also have a few composite estimators that either take an averaging or resampling approach to estimation.
estimate_h
is parameterised on the type of the estimator. The complete list of types is currently:
- MaximumLikelihood
- JackknifeMLE
- MillerMadow
- Grassberger
- ChaoShen
- Zhang
- Bonachela
- Shrink
- ChaoWangJost
- Unseen
- Schurmann
- SchurmannGeneralised
- BUB
- Bayes
- NSB
- PYM
- ANSB
- Jeffrey
- Laplace
- SchurmannGrassberger
- Minimax
- PERT
- WuYang
Frequentist Estimators
DiscreteEntropy.maximum_likelihood
— Functionmaximum_likelihood(data::CountData)::Float64
Compute the maximum likelihood estimation of Shannon entropy of data
in nats.
\[\hat{H}_{\tiny{ML}} = - \sum_{i=1}^K p_i \log(p_i)\]
or equivalently
\[\hat{H}_{\tiny{ML}} = \log(N) - \frac{1}{N} \sum_{i=1}^{K}h_i \log(h_i)\]
DiscreteEntropy.jackknife_mle
— Functionjackknife_mle(data::CountData; corrected=false)::Tuple{AbstractFloat, AbstractFloat}
Compute the jackknifed maximum_likelihood
estimate of data and the variance of the jackknifing (not the variance of the estimator itself).
If corrected
is true, then the variance is scaled with data.N - 1
, else it is scaled with data.N
. corrected
has no effect on the entropy estimation.
External Links
Estimation of the size of a closed population when capture probabilities vary among animals
DiscreteEntropy.miller_madow
— Functionmiller_madow(data::CountData)
Compute the Miller Madow estimation of Shannon entropy, with a positive bias based on the total number of samples seen (N) and the support size (K).
\[\hat{H}_{\tiny{MM}} = \hat{H}_{\tiny{ML}} + \frac{K - 1}{2N}\]
DiscreteEntropy.grassberger
— Functiongrassberger(data::CountData)
Compute the Grassberger (1988) estimation of Shannon entropy of data
in nats
\[\hat{H}_{\tiny{Gr88}} = \sum_i \frac{h_i}{H} \left(\log(N) - \psi(h_i) - \frac{(-1)^{h_i}}{n_i + 1} \right)\]
Equation 13 from Finite sample corrections to entropy and dimension estimate
DiscreteEntropy.schurmann
— Function schurmann(data::CountData, ξ::Float64 = ℯ^(-1/2))
Compute the Schurmann estimate of Shannon entropy of data
in nats.
\[\hat{H}_{SHU} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi} - 1} \frac{t^{h_i}-1}{1+t}dt \right)\]
There is no ideal value for $\xi$, however the paper suggests $e^{(-1/2)} \approx 0.6$
External Links
DiscreteEntropy.schurmann_generalised
— Functionschurmann_generalised(data::CountVector, xis::XiVector{T}) where {T<:Real}
\[\hat{H}_{\tiny{SHU}} = \psi(N) - \frac{1}{N} \sum_{i=1}^{K} \, h_i \left( \psi(h_i) + (-1)^{h_i} ∫_0^{\frac{1}{\xi_i} - 1} \frac{t^{h_i}-1}{1+t}dt \right) \]
Compute the generalised Schurmann entropy estimation, given a countvector data
and a xivector xis
, which must both be the same length.
schurmann_generalised(data::CountVector, xis::Distribution, scalar=false)
Computes the generalised Schurmann entropy estimation, given a countvector data
and a vector of xi
values.
External Links
DiscreteEntropy.bub
— Function bub(data::CountData; k_max=11, truncate=false, lambda_0=0.0)
Compute The Best Upper Bound (BUB) estimation of Shannon entropy, where
k_max
is a degree of freedom parameter. Paninski states that k_max
~ 10 is optimal for most applications. lambda_0
is the Lagrange multiplier on $a_0$ (see paper for details). This can be safely left at 0 for most applications. truncate
reduces the number of significant digits in intermediate floating point calculates. This exists to bring the output of this function closer to the original Matlab implementation. Leaving it at false
usually results in a slightly higher entropy estimate.
Example
n = [1,2,3,4,5,4,3,2,1]
(h, MM) = bub(from_counts(n))
(2.475817360451392, 0.6542542616181388)
where h
is the estimation of Shannon entropy in nats
and MM
is the upper bound on rms error
External Links
DiscreteEntropy.chao_shen
— Functionchao_shen(data::CountData)
Compute the Chao-Shen estimate of the Shannon entropy of data
in nats.
\[\hat{H}_{CS} = - \sum_{i=i}^{K} \frac{\hat{p}_i^{CS} \log \hat{p}_i^{CS}}{1 - (1 - \hat{p}_i^{CS})}\]
where
\[\hat{p}_i^{CS} = (1 - \frac{1 - \hat{p}_i^{ML}}{N}) \hat{p}_i^{ML}\]
DiscreteEntropy.zhang
— Functionzhang(data::CountData)
Compute the Zhang estimate of the Shannon entropy of data
in nats.
The recommended definition of Zhang's estimator is from Grabchak et al.
\[\hat{H}_Z = \sum_{i=1}^K \hat{p}_i \sum_{v=1}^{N - h_i} \frac{1}{v} ∏_{j=0}^{v-1} \left( 1 + \frac{1 - h_i}{N - 1 - j} \right)\]
The actual algorithm comes from Fast Calculation of entropy with Zhang's estimator by Lozano et al..
Exernal Links
DiscreteEntropy.bonachela
— Functionbonachela(data::CountData)
Compute the Bonachela estimator of the Shannon entropy of data
in nats.
\[\hat{H}_{B} = \frac{1}{N+2} \sum_{i=1}^{K} \left( (h_i + 1) \sum_{j=n_i + 2}^{N+2} \frac{1}{j} \right)\]
External Links
DiscreteEntropy.shrink
— Functionshrink(data::CountData)
Compute the Shrinkage, or James-Stein estimator of Shannon entropy for data
in nats.
\[\hat{H}_{\tiny{SHR}} = - \sum_{i=1}^{K} \hat{p}_x^{\tiny{SHR}} \log(\hat{p}_x^{\tiny{SHR}})\]
where
\[\hat{p}_x^{\tiny{SHR}} = \lambda t_x + (1 - \lambda) \hat{p}_x^{\tiny{ML}}\]
and
\[\lambda = \frac{ 1 - \sum_{x=1}^{K} (\hat{p}_x^{\tiny{SHR}})^2}{(n-1) \sum_{x=1}^K (t_x - \hat{p}_x^{\tiny{ML}})^2}\]
with
\[t_x = 1 / K\]
Notes
Based on the implementation in the R package entropy
External Links
DiscreteEntropy.chao_wang_jost
— Functionchao_wang_jost(data::CountData)
Compute the Chao Wang Jost Shannon entropy estimate of data
in nats.
\[\hat{H}_{\tiny{CWJ}} = \sum_{1 \leq h_i \leq N-1} \frac{h_i}{N} \left(\sum_{k=h_i}^{N-1} \frac{1}{k} \right) + \frac{f_1}{N} (1 - A)^{-N + 1} \left\{ - \log(A) - \sum_{r=1}^{N-1} \frac{1}{r} (1 - A)^r \right\}\]
with
\[A = \begin{cases} \frac{2 f_2}{(N-1) f_1 + 2 f_2} \, & \text{if} \, f_2 > 0 \\ \frac{2}{(N-1)(f_1 - 1) + 1} \, & \text{if} \, f_2 = 0, \; f_1 \neq 0 \\ 1, & \text{if} \, f_1 = f_2 = 0 \end{cases}\]
where $f_1$ is the number of singletons and $f_2$ the number of doubletons in data
.
Notes
The algorithm is slightly modified port of that used in the entropart R library.
External Links
DiscreteEntropy.unseen
— Functionunseen(data::CountData)
Compute the Unseen estimatation of Shannon entropy.
Example
n = [1,2,3,4,5,4,3,2,1]
h = unseen(from_counts(n))
1.4748194918254784
External Links
Estimating the Unseen: Improved Estimators for Entropy and Other Properties
Bayesian Estimators
DiscreteEntropy.bayes
— Functionbayes(data::CountData, α::AbstractFloat; K=nothing)
Compute an estimate of Shannon entropy given data and a concentration parameter $α$. If K is not provided, then the observed support size in data
is used.
\[\hat{H}_{\text{Bayes}} = - \sum_{k=1}^{K} \hat{p}_k^{\text{Bayes}} \; \log \hat{p}_k^{\text{Bayes}}\]
where
\[p_k^{\text{Bayes}} = \frac{K + α}{n + A}\]
and
\[A = \sum_{x=1}^{K} α_{x}\]
In addition to setting your own α, we have the following suggested choices
- jeffrey : α = 0.5
- laplace: α = 1.0
- schurmann_grassberger: α = 1 / K
- minimax: α = √{n} / K
DiscreteEntropy.jeffrey
— Function jeffrey(data::CountData; K=nothing)
Compute bayes
estimate of entropy, with $α = 0.5$
DiscreteEntropy.laplace
— Function laplace(data::CountData; K=nothing)
Compute bayes
estimate of entropy, with $α = 1.0$
DiscreteEntropy.schurmann_grassberger
— Function schurmann_grassberger(data::CountData; K=nothing)
Compute bayes
estimate of entropy, with $α = \frac{1}{K}$. If K is nothing, then use data.K
DiscreteEntropy.minimax
— Function minimax(data::CountData; K=nothing)
Compute bayes
estimate of entropy, with $α = √\frac{data.N}{K}$ where K = data.K if K is nothing.
DiscreteEntropy.nsb
— Functionnsb(data::CountData, K=data.K; verbose=false)
Returns the Bayesian estimate of Shannon entropy of data
, using the Nemenman, Shafee, Bialek algorithm
\[\hat{H}^{\text{NSB}} = \frac{ \int_0^{\ln(K)} d\xi \, \rho(\xi, \textbf{n}) \langle H^m \rangle_{\beta (\xi)} } { \int_0^{\ln(K)} d\xi \, \rho(\xi\mid n)}\]
where
\[\rho(\xi \mid \textbf{n}) = \mathcal{P}(\beta (\xi)) \frac{ \Gamma(\kappa(\xi))}{\Gamma(N + \kappa(\xi))} \prod_{i=1}^K \frac{\Gamma(n_i + \beta(\xi))}{\Gamma(\beta(\xi))}\]
If there are no coincidences in the data, NSB returns NaN
. If verbose
is true
, NSB will warn you of errors.
DiscreteEntropy.ansb
— Functionansb(data::CountData; undersampled::Float64=0.1)::Float64
Return the Asymptotic NSB estimation of the Shannon entropy of data
in nats.
See Asymptotic NSB estimator (equations 11 and 12)
\[\hat{H}_{\tiny{ANSB}} = (C_\gamma - \log(2)) + 2 \log(N) - \psi(\Delta)\]
where $C_\gamma$ is Euler's Gamma ($\approx 0.57721...$), $\psi_0$ is the digamma function and $\Delta$ the number of coincidences in the data.
This is designed for the extremely undersampled regime (K ~ N) and diverges with N when well-sampled. ANSB requires that $N/K → 0$, which we set to be $N/K < 0.1$ by default in undersampled
. You can, of course, experiment with this value, but the behaviour might be unpredictable.
If there are no coincidences in the data, ANSB returns NaN
External Links
Asymptotic NSB estimator (equations 11 and 12)
DiscreteEntropy.pym
— Functionpym(data::CountData)::Float64
A more or less faithful port of the original matlab code to Julia
Other Estimators
DiscreteEntropy.wu_yang_poly
— Functionwuyangpoly(data::CountData; L::Int=0, M::Float64=0.0, N::Int=0)
Compute the Wu Yang Polynomial wu_yang_poly
estimate of data. This implementation uses the precomputed coefficients found here
Optional Parameters
L::Int : polynomial degree, default = floor(1.6 * log(data.K)) M::Float64 : endpoint of approximation interval, default = 3.5 * log(data.K) N::Int : threshold for polynomial estimator application, default = floor(1.6 * log(data.K))
External Links
Minimax Rates of Entropy Estimation on Large Alphabets via Best Polynomial Approximation
DiscreteEntropy.pert
— Functionpert(data::CountData, estimator::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, b::Type{T}, c::Type{T}) where {T<:AbstractEstimator}
pert(data::CountData, a::Type{T}, b::Type{T1}, c::Type{T2}) where {T,T1,T2<:AbstractEstimator}
A Pert estimate of entropy, where
a = best estimate b = most likely estimate c = worst case estimate
\[H = \frac{a + 4b + c}{6}\]
where the default estimators are: $a$ = MaximumLikelihood, $c$ = ANSB and $b$ is the most likely value = ChaoShen. The user can, of course, specify any combination of estimators they want.
DiscreteEntropy.jackknife
— Function jackknife(data::CountData, estimator::Type{T}; corrected=false) where {T<:AbstractEstimator}
Compute the jackknifed estimate of estimator
on data
.
If corrected
is true, then the variance is scaled with data.N - 1
, else it is scaled with data.N
. corrected
has no effect on the entropy estimation.
DiscreteEntropy.bayesian_bootstrap
— Function bayesian_bootstrap(samples::SampleVector, estimator::Type{T}, reps, seed, concentration) where {T<:AbstractEstimator}
Compute a bayesian bootstrap resampling of samples
for estimation with estimator
, where reps
is number of resampling to perform, seed
is the random seed and concentration
is the concentration parameter for a Dirichlet distribution.
External Links
Types
Estimator types for developers. Estimators are either parameterised, or non-parameterised.
DiscreteEntropy.AbstractEstimator
— TypeAbstractEstimator
Supertype for NonParameterised and Parameterised entropy estimators.
DiscreteEntropy.NonParameterisedEstimator
— TypeNonParameterisedEstimator
Type for NonParameterised entropy estimators.
DiscreteEntropy.ParameterisedEstimator
— TypeParameterisedEstimator
Type for Parameterised entropy estimators.