DataTypes

DiscreteEntropy.EntropyData — Type

abstract type EntropyData
Histogram <: EntropyData
Samples <: EntropyData

It is very easy, when confronted with a vector such as $[1,2,3,4,5,4]$ to forget whether it represents samples from a distribution, or a histogram of a (discrete) distribution. DiscreteEntropy.jl attempts to make this a difficult mistake to make by enforcing a type difference between a vector of samples and a vector of counts.

See svector and cvector.

source

DiscreteEntropy.AbstractCounts — Type

AbstractCounts{T<:Real,V<:AbstractVector{T}} <: AbstractVector{T}

Enforced type incompatibility between vectors of samples, vectors of counts, and vectors of xi.

CountVector

A vector representing a histogram

SampleVector

A vector of samples

XiVector

A vector of xi values for use with the schurmann_generalised estimator.

source

DiscreteEntropy.CountData — Type

CountData

Fields

multiplicities::Matrix{Float64}: multiplicity representation of data
N::Float64: total number of samples
K::Int64: observed support size

Multiplicities

All of the estimators operate over a multiplicity representation of raw data. Raw data takes the form either of a vector of samples, or a vector of counts (ie a histogram).

Given histogram = [1,2,3,2,1,4], the multiplicity representation is

\[\begin{pmatrix} 4 & 2 & 3 & 1 \\ 1 & 2 & 1 & 2 \end{pmatrix}\]

The top row represents bin contents, and the bottom row the number of bins. We have 1 bin with 4 elements, 2 bins with 2 elements, 1 bin with 3 elements and 2 bins with only 1 element.

The advantages of the multiplicity representation are compactness and efficiency. Instead of calculating the surprisal of a bin of 2 twice, we can calculate it once and multiply by the multiplicity. The downside of the representation may be floating point creep due to multiplication.

Constructor

CountData is not expected to be called directly, nor is it advised to directly manipulate the fields. Use either from_data, from_counts or from_samples instead.

source

DiscreteEntropy.from_counts — Function

 from_counts(counts::AbstractVector; remove_zeros::Bool=true)
 from_counts(counts::CountVector, remove_zeros::Bool)

Return a CountData object from a vector or CountVector. Many estimators cannot handle a histogram with a 0 value bin, so there are filtered out unless remove_zeros is set to false.

source

DiscreteEntropy.from_data — Function

from_data(data::AbstractVector, ::Type{T}; remove_zeros=true) where {T<:EntropyData}

Create a CountData object from a vector or matrix. The function is parameterised on whether the vector contains samples or the histogram.

0 is automatically removed from data when data is treated as a count vector, but not when data is a vector of samples.

source

DiscreteEntropy.from_samples — Function

 from_samples(sample::SampleVector)

Return a CountData object from a vector of samples.

source

Vector Types

DiscreteEntropy.cvector — Function

 cvector(vs::AbstractVector; filter=false)
 cvector(vs::AbstractVector{<:Integer})
 cvector(vs::AbstractVector{<:Real}) = CountVector(vs)
 cvector(vs::AbstractArray{<:Real}) = CountVector(vec(vs))

Convert an AbstractVector into a CountVector. A CountVector represents the frequency of sampled values. If filter = true, remove 0 counts

source

DiscreteEntropy.svector — Function

svector(vs::AbstractVector{<:Integer})
svector(vs::AbstractVector{<:Real})
svector(vs::AbstractArray{<:Real})

Convert an AbstractVector into a SampleVector. A SampleVector represents a sequence of sampled values.

source

DiscreteEntropy.xivector — Function

 xivector(vs::AbstractVector{<:Real})
 xivector(vs::AbstractArray{<:Real})

Convert an AbstractVector{Real} into a XiVector. Exclusively for use with schurmann_generalised.

source