Reverse automated differentiation from an expression or a function
| Julia 0.3 | Julia 0.4 | Julia 0.5 | master (on nightly + release) | Coverage |
|---|---|---|---|---|
![]() |
This package provides a function rdiff() that generates valid Julia code for the calculation of derivatives up to any order for a user supplied expression or generic function. Install with Pkg.add("ReverseDiffSource"). Package documentation and examples can be found here.
This version of automated differentiation operates at the source level (provided either in an expression or a generic function) to output Julia code calculating the derivatives (in a expression or a function respectively). Compared to other automated differentiation methods it does not rely on method overloading or new types and should, in principle, produce fast code.
Usage examples:
- derivative of x³
julia> rdiff( :(x^3) , x=Float64) # 'x=Float64' indicates the type of x to rdiff
:(begin
(x^3,3 * x^2.0) # expression calculates a tuple of (value, derivate)
end)
- first 10 derivatives of
sin(x)(notice the simplifications)
julia> rdiff( :(sin(x)) , order=10, x=Float64) # derivatives up to order 10
:(begin
_tmp1 = sin(x)
_tmp2 = cos(x)
_tmp3 = -_tmp1
_tmp4 = -_tmp2
_tmp5 = -_tmp3
(_tmp1,_tmp2,_tmp3,_tmp4,_tmp5,_tmp2,_tmp3,_tmp4,_tmp5,_tmp2,_tmp3)
end)
- works on functions too
julia> rosenbrock(x) = (1 - x[1])^2 + 100(x[2] - x[1]^2)^2 # function to be derived
julia> rosen2 = rdiff(rosenbrock, (Vector{Float64},), order=2) # orders up to 2
(anonymous function)
- gradient calculation of a 3 hidden layer neural network for backpropagation
# w1-w3 are the hidden layer weight matrices, x1 the input vector
function ann(w1, w2, w3, x1)
x2 = w1 * x1
x2 = log(1. + exp(x2)) # soft RELU unit
x3 = w2 * x2
x3 = log(1. + exp(x3)) # soft RELU unit
x4 = w3 * x3
1. / (1. + exp(-x4[1])) # sigmoid output
end
w1, w2, w3 = randn(10,10), randn(10,10), randn(1,10)
x1 = randn(10)
dann = m.rdiff(ann, (Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, Vector{Float64}))
dann(w1, w2, w3, x1) # network output + gradient on w1, w2, w3 and x1
