A Set is an unordered collection of things in which an item may appear at most once.
A Multiset is an unordered collection of things with repetition permitted.
julia> using Multisets
julia> M = Multiset{Type}()
where Type is the type of elements to be held in M
(e.g., Int or String).
If Type is omitted, this defaults to Any.
Given a collection list of elements (such as a Vector or Set)
invoking Multiset(list) creates a new Multiset in which the elements
of list appear with the appropriate multiplicity. For example,
Multiset(ones(Int,3)) creates the multiset {1,1,1}.
julia> M = Multiset([1,1,2,3,5])
{1,1,2,3,5}
julia> M = Multiset(5,3,2,1,1)
{1,1,2,3,5}
julia> eltype(M)
Int64
- push!(M,x)increases the multiplicity of- xin- Mby 1. If- xis not already in- M, then it is added to- M.
- push!(M,x,incr)increases the multiplicity of- xin- Mby- incr. We allow- incrto be negative to decrease the multiplicity of- x(but not below 0).
- M[x]=mexplicitly sets the multiplicty of- xto- m.
- delete!(M,x)removes- xfrom- M.- M[x]=0has the same effect.
- keys(M)returns an iterator for the elements of- M(that have multiplicyt at least one).
- values(M)returns an iterator for the multiplicities of the elements of- M.
- pairs(M)returns an iterator over- element => multiplicitypairs for- M.
julia> M = Multiset("alpha", "beta", "beta", "gamma", "gamma", "gamma")
{alpha,beta,beta,gamma,gamma,gamma}
julia> collect(keys(M))
3-element Array{String,1}:
 "alpha"
 "gamma"
 "beta"
julia> collect(values(M))
3-element Array{Int64,1}:
 1
 3
 2
julia> pairs(M)
Dict{String,Int64} with 3 entries:
  "alpha" => 1
  "gamma" => 3
  "beta"  => 2
To determine the multiplicity of x in M use M[x]. This returns 0
if x was never added to M.
To get a list of all the elements in M, use collect:
julia> collect(M)
6-element Array{Int64,1}:
 1
 1
 2
 2
 3
 4
Notice that elements are repeated per their multiplicity.
To get a list of the elements in which elements appear
only once each use collect(keys(M)).
To convert M into a Julia Set (effectively, set all multiplicities to 1)
use Set(M):
julia> Set(M)
Set([4,2,3,1])
The result of println(M) can be controlled by the following functions.
Suppose a multiset is created as follows:
julia> M = Multiset{String}();
julia> push!(M,"alpha");
julia> push!(M,"beta", 2);
- set_braces_show()causes multisets to be printed as a list enclosed in curly braces:- {alpha,beta,beta}. This is the default. If the multiset is empty,- ∅is printed.
- set_short_show()causes multisets to be printed in an abbreviated format like this:- Multiset{String} with 3 elements.
- set_julia_show()causes multisets to be printed in a form that would be a proper Julia definition of that multiset:- Multiset(String["alpha","beta","beta"]).
- set_key_value_show()causes multisets to be printed in a way that shows each element and its multiplicity as key-value pairs:- Multiset{String}("alpha" => 1, "beta" => 2).
The functions union and intersect compute the union and intersection
of multisets. For example:
julia> A = Multiset(1,2,2,3)
{1,2,2,3}
julia> B = Multiset(1,1,1,2,4)
{1,1,1,2,4}
julia> union(A,B)
{1,1,1,2,2,3,4}
julia> intersect(A,B)
{1,2}
The multiplicity of x in union(A,B) is max(A[x],B[x]) and
the multiplicity in intersect(A,B) is min(A[x],B[x]).
See + below (disjoint union) which behaves differently.
- 
The Cartesian product of multisets AandBis computed withA*B. Ifais an element ofAandbis an element ofBthen the multiplicity of(a,b)inA*BisA[x]*B[x].
- 
For a nonnegative integer nand a multisetAthe result ofn*Ais a new multiset in which the multiplicy ofxisn*A[x].
- 
The disjoint union of AandBis computed withA+B. Ifais an element ofAandbis an element ofBthen the multiplicity of(a,b)inA*BisA[x]+B[x].
- 
The difference of multisets is computed as A-B. In the result, the multiplicity ofxismax(0, A[x]-B[x]). This is not the same assetdiffbecausesetdiff(A,B)gives a multiset in which any element ofBis completely removed fromA.
julia> A = Multiset(1,1,2,3)
{1,1,2,3}
julia> B = Multiset(1,2)
{1,2}
julia> A-B
{1,3}
julia> setdiff(A,B)
{3}
The function length computes the cardinality (number of elements)
in a multiset (including multiplicities).
The function isempty returns true exactly when length(M)==0.
The operator A==B and the function issubset(A,B) are provided to determine
if A and B are equal or Ais a submultiset of B. This can also be expressed
as A ⊆ B or B ⊇ A.
Note that A==B holds when A[x]==B[x] for all x and issubset(A,B)
holds when A[x] <= B[x] for all x.
To test if x is an element of a multiset A, one may use any of the following:
- x ∈ A
- in(x,A)
- A ∋ x
- A[x] > 0
When iterating over a Multiset each element is repeated according to its
multiplicity.
julia> A = Multiset(1,2,1,2,3)
{1,1,2,2,3}
julia> for a in A
       println(a)
       end
2
2
3
1
1
julia> sum(A)
9
Multisets are useful devices for counting. For example, suppose a program
reads in words from a text file and we want to count how often each word
appears in that file. We can let M = Multiset{String}() and then
step through the words in the file pushing each instance into M.
The basic structure looks like this:
for word in FILE
  push!(M,word)
end
In the end, M[word] will return how often word was seen in the file.
See also my Counters module.
A Multiset consists of a single data field called data that is a
dictionary mapping elements to their multiplicities. The various
Multiset functions ensure the integrity of data (enforcing nonnegativity).
The function clean! purges the data field of any elements with multiplicity
equal to 0. This is used by the hash function which is provided so a Multiset
can be used as a key in a dictionary, etc. The hash of a
Multiset is simply the hash of its cleaned data field.
Note: The clean! function is not exported. There probably should be no
reason for the user to invoke it, but if desired use
Multisets.clean!(M).
If A is a Multiset, then B=A assigns B to be the same object as A. That is,
any changes to one affects the other:
julia> A = Multiset(1,1,2,3,5)
{1,1,2,3,5}
julia> B = A
{1,1,2,3,5}
julia> A[8]=2;
julia> A
{1,1,2,3,5,8,8}
julia> B
{1,1,2,3,5,8,8}
To make an independent copy, use copy or deepcopy:
julia> A = Multiset(1,1,2,3,5)
{1,1,2,3,5}
julia> B = deepcopy(A)
{1,1,2,3,5}
julia> A[8]=2;
julia> A
{1,1,2,3,5,8,8}
julia> B
{1,1,2,3,5}
Note that deepcopy duplicates all elements of the multiset. If the elements are immutable,
then deepcopy is more efficient than copy. On the other hand, if the elements are
mutable, then deepcopy is slower because it creates independent copies of the elements.