Popularity
54 Stars
Updated Last
10 Months Ago
Started In
December 2017


This package defines a Composition{D} type representing a D-part composition as defined by Aitchison 1986. In Aitchison's geometry, the D-simplex together with addition (a.k.a. pertubation) and scalar multiplication (a.k.a. scaling) form a vector space, and important properties hold:

  • Scaling invariance
  • Pertubation invariance
  • Permutation invariance
  • Subcompositional coherence

In practice, this means that one can operate on compositional data (i.e. vectors whose entries represent parts of a total) without destroying the ratios of the parts.

Installation

Get the latest stable release with Julia's package manager:

] add CoDa

Usage

Basics

Compositions are static vectors with named parts:

julia> using CoDa

julia> c = Composition(CO₂=2.0, CH₄=0.1, N₂O=0.3)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 2.0   
   CH₄ ┤■■ 0.1                                    
   N₂O ┤■■■■■ 0.3                                 
       └                                        ┘ 

julia> parts(c)
(:CO₂, :CH₄, :N₂O)

julia> components(c)
3-element StaticArrays.SVector{3, Union{Missing, Float64}} with indices SOneTo(3):
 2.0
 0.1
 0.3

julia> c.CO₂
2.0

Default names are added otherwise:

julia> c = Composition(1.0, 0.1, 0.1)
                     3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   w2 ┤■■■■ 0.1                                  
   w3 ┤■■■■ 0.1                                  
      └                                        ┘ 

and serve for internal compile-time checks.

Compositions can be added, subtracted, negated, and multiplied by scalars. Other operations are also defined including dot product, induced norm, and distance:

julia> cₒ = Composition(CO₂=1.0, CH₄=0.1, N₂O=0.1)
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 1.0   
   CH₄ ┤■■■■ 0.1                                  
   N₂O ┤■■■■ 0.1                                  
       └                                        ┘ 

julia> -cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■ 0.047619047619047616                   
   CH₄ ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
   N₂O ┤■■■■■■■■■■■■■■■■■■■ 0.47619047619047616   
       └                                        ┘ 

julia> 0.5c
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■ 0.6207690197922022   
   CH₄ ┤■■■■ 0.13880817265812764                  
   N₂O ┤■■■■■■■■ 0.24042280754967013              
       └                                        ┘ 

julia> c - cₒ
                  3-part composition
       ┌                                        ┐ 
   CO₂ ┤■■■■■■■■■■■■■■■■■■■■■■■ 0.3333333333333333  
   CH₄ ┤■■■■■■■■■■■■ 0.16666666666666666          
   N₂O ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5   
       └                                        ┘ 

julia> c  cₒ
3.7554028908352994

julia> norm(c)
2.1432393747688687

julia> aitchison(c, cₒ) # Aitchison distance
0.7856640352007868

More complex functions can be defined in terms of these operations. For example, the function below defines the composition line passing through cₒ in the direction of c:

julia> f(λ) = cₒ + λ*c
f (generic function with 1 method)

Finally, two compositions are considered to be equal when their closure is approximately equal:

julia> c == c
true

julia> c == cₒ
false

Transforms

Currently, the following transforms are implemented:

julia> alr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
  1.8971199848858813
 -1.0986122886681096

julia> clr(c)
3-element StaticArrays.SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
  1.6309507528132907
 -1.3647815207407001
 -0.2661692320725906

julia> ilr(c)
2-element StaticArrays.SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 -2.1183026052494185
 -0.3259894019031434

and their inverses alrinv, clrinv and ilrinv.

The package also defines transforms for tables following to the TableTransforms.jl interface, including Closure, Remainder, ALR, CLR, ILR. These transforms are functors that can be used as follows:

julia> table |> ILR()

Arrays

It is often useful to compose D columns of a table into D-part compositions. The package provides a CoDaArray type that implements the Julia array interface and the Tables.jl interface. We recommend using the function compose(table, cols) to construct such arrays:

julia> table = (a=[1,2,3], b=[4,5,6], c=[7,8,9])
(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])

julia> ctable = compose(table, (:a,:b))
(c = [7, 8, 9], coda = Composition{2, (:a, :b)}["1.000 : 4.000", "2.000 : 5.000", "3.000 : 6.000"])

julia> ctable.coda[1]
                2-part composition
     ┌                                        ┐ 
   a ┤■■■■■■■■■ 1.0                             
   b ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 4.0   
     └                                        ┘ 

Random

D-part compositions can be created at random from a Dirichlet distribution:

julia> rand(Composition{3})
                 3-part composition
      ┌                                        ┐ 
   w1 ┤■■■■■■■■■■■■■■■■■ 0.39938229705106565     
   w2 ┤■■■■■■ 0.1491859823748656                 
   w3 ┤■■■■■■■■■■■■■■■■■■■ 0.45143172057406883   
      └                                        ┘

Plots

Separate packages are available for plotting compositional data:

References

This package is heavily influenced by Aitchison's monograph:

  • Aitchison, J. 1986. The Statistical Analysis of Compositional Data

and by other textbooks:

  • den Boogaart, K. & Tolosana-Delgado. 2011. Analyzing Compositional Data with R
  • Pawlowsky-Glahn et al. 2015. Modeling and Analysis of Compositional Data
  • Pawlowsky-Glahn, V. & Buccianti, A. 2011. Compositional Data Analysis - Theory and Applications

Notes

The unicode display of composition objects can be obtained with the following code:

using UnicodePlots
using CoDa

function Base.show(io::IO, mime::MIME"text/plain",
                   c::Composition{D,PARTS}) where {D,PARTS}
  w = components(c)
  x = Vector{Float64}()
  p = Vector{Symbol}()
  m = Vector{Symbol}()
  for i in 1:D
    if ismissing(w[i])
      push!(m, PARTS[i])
    else
      push!(p, PARTS[i])
      push!(x, w[i])
    end
  end
  plt = barplot(p, x, title="$D-part composition")
  isempty(m) || annotate!(plt, :t, "missing: $(join(m,", "))")
  show(io, mime, plt)
end

The code is not added to the CoDa.jl package itself because the UnicodePlots.jl package has become a very heavy dependency, see UnicodePlots/issues/291.