Probability distributions and measures for finite sample spaces whose
elements are labeled (consist of the class pool of a
CategoricalArray).
Designed for performance in machine learning applications. For
example, probabilistic classifiers in
MLJ typically
predict the UnivariateFiniteVector objects defined in this package.
For probability distributions over integers see the Distributions.jl package, whose methods the current package extends.
| Linux | Coverage |
|---|---|
using Pkg
Pkg.add("CategoricalDistributions")The sample space of the UnivariateFinite distributions provided by
this package is the class pool of a CategoricalArray:
using CategoricalDistributions
using CategoricalArrays
import Distributions
import UnicodePlots # for optional pretty display
data = ["no", "yes", "no", "maybe", "maybe", "no",
"maybe", "no", "maybe"] |> categorical
julia> d = Distributions.fit(UnivariateFinite, data)
UnivariateFinite{Multiclass{3}}
┌ ┐
maybe ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.4
no ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5
yes ┤■■■■■■■ 0.1
└ ┘
julia> pdf(d, "no")
0.5
julia> mode(d)
CategoricalValue{String, UInt32} "no"A UnivariateFinite distribution can also be constructed directly
from a probability vector:
julia> d2 = UnivariateFinite(["no", "yes"], [0.15, 0.85], pool=data)
UnivariateFinite{Multiclass{3}}
┌ ┐
no ┤■■■■■■ 0.15
yes ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.85
└ ┘A UnivariateFinite distribution tracks all classes in the pool:
levels(d2)
3-element Vector{String}:
"maybe"
"no"
"yes"
julia> pdf(d2, "maybe")
0.0
julia> pdf(d2, "okay")
ERROR: DomainError with Value okay not in pool. :Arrays of UnivariateFinite distributions are defined using the same
constructor. Broadcasting methods, such as pdf, are optimized for
such arrays:
julia> v = UnivariateFinite(["no", "yes"], [0.1, 0.2, 0.3, 0.4], augment=true, pool=data)
4-element UnivariateFiniteArray{Multiclass{3}, String, UInt32, Float64, 1}:
UnivariateFinite{Multiclass{3}}(no=>0.9, yes=>0.1)
UnivariateFinite{Multiclass{3}}(no=>0.8, yes=>0.2)
UnivariateFinite{Multiclass{3}}(no=>0.7, yes=>0.3)
UnivariateFinite{Multiclass{3}}(no=>0.6, yes=>0.4)
julia> pdf.(v, "no")
4-element Vector{Float64}:
0.9
0.8
0.7
0.6
Query the UnivariateFinite doc-string for advanced constructor options.
A (non-standard) implementation of pdf allows for extraction of the full
probability array:
julia> L = levels(data)
3-element Vector{String}:
"maybe"
"no"
"yes"
julia> pdf(v, L)
4×3 Matrix{Float64}:
0.0 0.9 0.1
0.0 0.8 0.2
0.0 0.7 0.3
0.0 0.6 0.4There is, in fact, no enforcement that probabilities in a
UnivariateFinite distribution sum to one, only that they be belong
to a type T for which zero(T) is defined. In particular
UnivariateFinite objects implement arbitrary non-negative, signed,
or complex measures over a finite labeled set.
-
A new type
UnivariateFinite{S}for representing probability distributions over the pool of aCategoricalArray, that is, over finite labeled sets. HereSis a subtype ofOrderedFactorfrom ScientificTypesBase.jl, if the pool is ordered, or ofMulticlassif the pool is unordered. -
A new array type
UnivariateFiniteArray{S} <: AbstractArray{<:UnivariateFinite{S}}for efficiently manipulating arrays ofUnivariateFinitedistributions. -
Implementations of
randfor generating random samples of aUnivariateFinitedistribution. -
Implementations of the
pdf,logpdf,modeandmodesmethods of Distributions.jl, with efficient broadcasting over the new array type. -
Implementation of
Distributions.fitfrom Distributions.jl forUnivariateFinitedistributions. -
A single constructor for constructing
UnivariateFinitedistributions and arrays thereof, from arrays of probabilities.
The initial release of this package is based almost entirely on code originally residing in MLJBase.jl with contributions from Anthony Blaom, Thibaut Lienart, Samuel Okon, and Chad Scherrer. These contributions are not reflected in the current repository's commit history.