A small Julia library for calculating the normalized compression distance.
Author simonschoelly
3 Stars
Updated Last
1 Year Ago
Started In
January 2021


Stable Dev Build Status Coverage

This package contains methods to calculate the Normalized Compression Distance (NCD) - a metric for measuring how similar two strings are using a real life compression algorithm such as bzip2.


InformationDistances.jl is registered in the general registry and can therefore be simply installed from the REPL with

] add InformationDistances

Quick example

julia> using InformationDistances

# Create three strings that we want to compare - we expect s1 and s2 to be more similar than any of them to s3
julia> s1 = repeat("ab", 100)

julia> s2 = repeat("ba", 100)

julia> s3 = String(rand(('a', 'b'), 200))

# Create a normalized compression distance with the default parameters
julia> d = NormalizedCompressionDistance();

julia> d(s1, s2)

julia> d(s1, s3)

julia> d(s2, s3)

# Create annother distance that uses Bzip2 for compression
julia> using CodecBzip2: Bzip2Compressor

julia> d_bzip2 = NormalizedCompressionDistance(CodecCompressor{Bzip2Compressor}(workfactor=250));

julia> d_bzip2(s1, s2)

julia> d_bzip2(s1, s3)

julia> d_bzip2(s2, s3)

Example Notebooks

The examples folder contains an interactive notebook that can be run with Pluto.jl. To quickly view the notebook online there is also a static non-interactive version where it is currently not possible to choose different options.


Li, Ming, Xin Chen, Xin Li, Bin Ma, and Paul MB Vitányi. "The similarity metric." IEEE transactions on Information Theory 50, no. 12 (2004): 3250-3264.

Used By Packages

No packages found.