DiscreteEntropy
Summary
DiscreteEntropy
is a Julia package to estimate the Shannon entropy of discrete data.
DiscreteEntropy implements a large collection of entropy estimators.
At present, we have implementations for:
Function | Type |
---|---|
maximum_likelihood | MaximumLikelihood |
jackknife_mle | JackknifeMLE |
miller_madow | MillerMadow |
grassberger | Grassberger |
schurmann | Schurmann |
schurmann_generalised | SchurmannGeneralised |
bub | BUB |
chao_shen | ChaoShen |
zhang | Zhang |
bonachela | Bonachela |
shrink | Shrink |
chao_wang_jost | ChaoWangJost |
unseen | Unseen |
bayes | Bayes |
jeffrey | Jeffrey |
laplace | Laplace |
schurmann_grassberger | SchurmannGrassberger |
minimax | Minimax |
nsb | NSB |
ansb | ANSB |
pym | PYM |
The type is mainly used with the function estimate_h
, see below.
We also have some non-traditional mixed estimators, such as jackknife
, which allows jackknife resampling to be applied to any estimator, bayesian_bootstrap
which applies bootstrap resampling to an estimator, and pert
, which is a three point estimation technique combining pessimistic and optimistic estimations.
In addition, we also provide a number of other information theoretic measures which use these estimators under the hood:
mutual_information
conditional_entropy
cross_entropy
kl_divergence
jensen_shannon_divergence
jensen_shannon_distance
jeffreys_divergence
uncertainty_coefficient
Installing DiscreteEntropy
If you have not done so already, install Julia. Julia 1.8 to Julia <=1.10 are currently supported. Nightly and Julia 1.11 are not (yet) supported.
Install
DiscreteEntropy
using
using Pkg; Pkg.add("DiscreteEntropy")
or
] add DiscreteEntropy
Basic Usage
using DiscreteEntropy
data = [1,2,3,4,3,2,1];
7-element Vector{Int64}:
1
2
3
4
3
2
1
Most of the estimators take a CountData
object. This is a compact representation of the histogram of the random variable. It can be pretty easy to forget whether a vector represents a histogram or a set of samples, so DiscreteEntropy
forces you to say which it is when creating a CountData
object. The easiest way to create a CountData
object is using from_data
.
# if `data` is a histogram already
cd = from_data(data, Histogram)
CountData([4.0 2.0 3.0 1.0; 1.0 2.0 2.0 2.0], 16.0, 7)
# or if `data` is actually a vector of samples
cds = from_data(data, Samples)
CountData([2.0 1.0; 3.0 1.0], 7.0, 4)
# now we can estimate
h = estimate_h(from_data(data, Histogram), ChaoShen)
# treating data as a vector of samples
h = estimate_h(from_data(data, Samples), ChaoShen)
1.6310218225019266
DiscreteEntropy.jl
outputs Shannon measures in nats
. There are helper functions to convert to_bits
and to_bans
h = to_bits(estimate_h(cd, ChaoShen))
2.997302182277761
h = to_bans(estimate_h(cd, ChaoShen))
0.9022778629347157
Contributing
All contributions are welcome! Please see CONTRIBUTING.md for details. Anyone wishing to add an estimator is particularly welcome. Ideally, the estimator will take a CountData
struct, though this might not always be suitable (eg schurmann_generalised
) and also added to estimate_h
. Any estimator will also have to come with tests.