NMF.jl
A Julia package for nonnegative matrix factorization (NMF).
Development Status
Note: Nonnegative Matrix Factorization is an area of active research. New algorithms are proposed every year. Contributions are very welcomed.
Done
 Lee & Seung's Multiplicative Update (for both MSE & Divergence objectives)
 (Naive) Projected Alternate Least Squared
 ALS Projected Gradient Methods
 Coordinate Descent Methods
 Random Initialization
 NNDSVD Initialization
 Sparse NMF
 Separable NMF
To do
 Probabilistic NMF
Overview
Nonnegative Matrix Factorization (NMF) generally refers to the techniques for factorizing a nonnegative matrix X
into the product of two lower rank matrices W
and H
, such that WH
optimally approximates X
in some sense. Such techniques are widely used in text mining, image analysis, and recommendation systems.
This package provides two sets of tools, respectively for initilization and optimization. A typical NMF procedure consists of two steps: (1) use an initilization function that initialize W
and H
; and (2) use an optimization algorithm to pursue the optimal solution.
Most types and functions (except the highlevel function nnmf
) in this package are not exported. Users are encouraged to use them with the prefix NMF.
. This way allows us to use shorter names within the package and makes the codes more explicit and clear on the user side.
HighLevel Interface
The package provides a highlevel function nnmf
that runs the entire procedure (initialization + optimization):
nnmf(X, k, ...)
This function factorizes the input matrix X
into the product of two nonnegative matrices W
and H
.
In general, it returns a result instance of type NMF.Result
, which is defined as
struct Result
W::Matrix{Float64} # W matrix
H::Matrix{Float64} # H matrix
niters::Int # number of elapsed iterations
converged::Bool # whether the optimization procedure converges
objvalue::Float64 # objective value of the last step
end
The function supports the following keyword arguments:

init
: A symbol that indicates the initialization method (default =:nndsvdar
).This argument accepts the following values:
random
: matrices filled with uniformly random valuesnndsvd
: standard version of NNDSVDnndsvda
: NNDSVDa variantnndsvdar
: NNDSVDar variantspa
: Successive Projection Algorithmcustom
: use custom matricesW0
andH0
With the
nndsvd
variants, you can optionally supply the SVD result viainitdata
, e.g.,initdata=svd(X)
. If not supplied, the factorization is computed using a randomized svd (rsvd
from RandomizedLinAlg). 
alg
: A symbol that indicates the factorization algorithm (default =:greedycd
).This argument accepts the following values:
multmse
: Multiplicative update (using MSE as objective)multdiv
: Multiplicative update (using divergence as objective)projals
: (Naive) Projected Alternate Least Squarealspgrad
: Alternate Least Square using Projected Gradient Descentcd
: Coordinate Descent solver that uses Fast Hierarchical Alternating Least Squares (implemetation similar to scikitlearn)greedycd
: Greedy Coordinate Descentspa
: Successive Projection Algorithm

maxiter
: Maximum number of iterations (default =100
). 
tol
: tolerance of changes upon convergence (default =1.0e6
). 
replicates
: Number of times to perform factorization (default =1
). 
W0
: Option for custom initialization (default =nothing
). 
H0
: Option for custom initialization (default =nothing
).Note:
W0
andH0
may be overwritten. If one needs to avoid it, please pass in copies themselves. 
update_H
: Option for specifying whether to update H (default =true
). 
verbose
: whether to show procedural information (default =false
).
Initialization

NMF.randinit(X, k[; zeroh=false, normalize=false])
Initialize
W
andH
given the input matrixX
and the rankk
. This function returns a pair(W, H)
.Suppose the size of
X
is(p, n)
, then the size ofW
andH
are respectively(p, k)
and(k, n)
.Usage:
W, H = NMF.randinit(X, 3)
For some algorithms (e.g. ALS), only
W
needs to be initialized. For such cases, one may set the keyword argumentzeroh
to betrue
, then in the outputH
will be simply a zero matrix of size(k, n)
.Another keyword argument is
normalize
. Ifnormalize
is set totrue
, columns ofW
will be normalized such that each column sum to one. 
NMF.nndsvd(X, k[; zeroh=false, variant=:std])
Use the NonNegative Double Singular Value Decomposition (NNDSVD) algorithm to initialize
W
andH
.Reference: C. Boutsidis, and E. Gallopoulos. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 2007.
Usage:
W, H = NMF.nndsvd(X, k)
This function has two keyword arguments:
zeroh
: haveH
initialized when it is set totrue
, or setH
to all zeros when it is set tofalse
.variant
: the variant of the algorithm. Default isstd
, meaning to use the standard version, which would generate a rather sparseW
. Other values area
andar
, respectively corresponding to the variants: NNDSVDa and NNDSVDar. Particularly,ar
is recommended for dense NMF.

NMF.spa(X, k)
Use the Successive Projection Algorithm (SPA) to initialize
W
andH
.Reference: N. Gillis and S. A. Vavasis, Fast and robust recursive algorithms for separable nonnegative matrix factorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698714, 2013.
Usage:
W, H = NMF.spa(X, k)
Factorization Algorithms
This package provides multiple factorization algorithms. Each algorithm corresponds to a type. One can create an algorithm instance by choosing a type and specifying the options, and run the algorithm using NMF.solve!
:
The NMF.solve! Function
NMF.solve!(alg, X, W, H)
Use the algorithm alg
to factorize X
into W
and H
.
Here, W
and H
must be preallocated matrices (respectively of size (p, k)
and (k, n)
). W
and H
must be appropriately initialized before this function is invoked. For some algorithms, both W
and H
must be initialized (e.g. multiplicative updating); while for others, only W
needs to be initialized (e.g. ALS).
The matrices W
and H
are updated in place.
Algorithms

Multiplicative Updating
Reference: Daniel D. Lee and H. Sebastian Seung. Algorithms for Nonnegative Matrix Factorization. Advances in NIPS, 2001.
This algorithm has two different kind of objectives: minimizing meansquarederror (
:mse
) and minimizing divergence (:div
). BothW
andH
need to be initialized.MultUpdate(obj::Symbol=:mse, # objective, either :mse or :div maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=0.0, # L1 regularization coefficient for W lambda_h::Real=0.0) # L1 regularization coefficient for H
Note: the values above are default values for the keyword arguments. One can override part (or all) of them.

(Naive) Projected Alternate Least Square
This algorithm alternately updates
W
andH
while holding the other fixed. Each update step solvesW
orH
without enforcing the nonnegativity constrait, and forces all negative entries to zeros afterwards. OnlyW
needs to be initialized.ProjectedALS(maxiter::Integer=100, # maximum number of iterations verbose::Bool=false, # whether to show procedural information tol::Real=1.0e6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=1.0e6, # L2 regularization coefficient for W lambda_h::Real=1.0e6) # L2 regularization coefficient for H

Alternate Least Square Using Projected Gradient Descent
Reference: ChihJen Lin. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computing, 19 (2007).
This algorithm adopts the alternate least square strategy. A efficient projected gradient descent method is used to solve each subproblem. Both
W
andH
need to be initialized.ALSPGrad(maxiter::Integer=100, # maximum number of iterations (in main procedure) maxsubiter::Integer=200, # maximum number of iterations in solving each subproblem tol::Real=1.0e6, # tolerance of changes on W and H upon convergence tolg::Real=1.0e4, # tolerable gradient norm in subproblem (firstorder optimality) update_H::Bool=true, # whether to update H verbose::Bool=false) # whether to show procedural information

Coordinate Descent solver with Fast Hierarchical Alternating Least Squares
Reference: Cichocki, Andrzej, and P. H. A. N. AnhHuy. Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708721 (2009).
Sequential constrained minimization on a set of squared Euclidean distances over W and H matrices. Uses l_1 and l_2 penalties to enforce sparsity.
CoordinateDescent(maxiter::Integer=100, # maximum number of iterations (in main procedure) verbose::Bool=false, # whether to show procedural information tol::Real=1.0e6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H α::Real=0.0, # constant that multiplies the regularization terms regularization=:both, # select whether the regularization affects the components (H), the transformation (W), both or none of them (:components, :transformation, :both, :none) l₁ratio::Real=0.0, # l1 / l2 regularization mixing parameter (in [0; 1]) shuffle::Bool=false) # if true, randomize the order of coordinates in the CD solver

Greedy Coordinate Descent
Reference: ChoJui Hsieh and Inderjit S. Dhillon. Fast coordinate descent methods with variable selection for nonnegative matrix factorization. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1064–1072 (2011).
This algorithm is a fast coordinate descent method with variable selection. Both
W
andH
need to be initialized.GreedyCD(maxiter::Integer=100, # maximum number of iterations (in main procedure) verbose::Bool=false, # whether to show procedural information tol::Real=1.0e6, # tolerance of changes on W and H upon convergence update_H::Bool=true, # whether to update H lambda_w::Real=0.0, # L1 regularization coefficient for W lambda_h::Real=0.0) # L1 regularization coefficient for H

Successive Projection Algorithm for Separable NMF
Reference: N. Gillis and S. A. Vavasis, "Fast and robust recursive algorithms for separable nonnegative matrix factorization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698714, 2013.
A separable matrix X can be written as
X = WH = W[I V]P
, whereW
has rankk
,I
is the identity matrix, the sum of the entries of each column ofV
is at most one, andP
is a permutation matrix to arange the columns of[I V]
randomly. Separable NMF aims to decompose a separable matrixX
into two nonnegative factor matricesW
andH
, so thatWH
is equal toX
. This algorithm is used for separable NMF. BothW
andH
need to be initialized byinit=:spa
.SPA(obj::Symbol=:mse) # objective :mse or :div
Examples
Here are examples that demonstrate how to use this package to factorize a nonnegative dense matrix.
Use Highlevel Function: nnmf
... # prepare input matrix X
r = nnmf(X, k; alg=:multmse, maxiter=30, tol=1.0e4)
W = r.W
H = r.H
Use Multiplicative Update
import NMF
# initialize
W, H = NMF.randinit(X, 5)
# optimize
NMF.solve!(NMF.MultUpdate{Float64}(obj=:mse,maxiter=100), X, W, H)
Use Naive ALS
import NMF
# initialize
W, H = NMF.randinit(X, 5)
# optimize
NMF.solve!(NMF.ProjectedALS{Float64}(maxiter=50), X, W, H)
Use ALS with Projected Gradient Descent
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.ALSPGrad{Float64}(maxiter=50, tolg=1.0e6), X, W, H)
Use Coordinate Descent
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.CoordinateDescent{Float64}(maxiter=50, α=0.5, l₁ratio=0.5), X, W, H)
Use Greedy Coordinate Descent
import NMF
# initialize
W, H = NMF.nndsvd(X, 5, variant=:ar)
# optimize
NMF.solve!(NMF.GreedyCD{Float64}(maxiter=50), X, W, H)
Use Successive Projection Algorithm for Separable NMF
import NMF
# initialize
W, H = NMF.spa(X, 5)
# optimize
NMF.solve!(NMF.SPA{Float64}(obj=:mse), X, W, H)