- Extend
Base.Broadcast by macros:
@tab: Tuple of Array Broadcast --- broadcast with multiple outputs will be stored in tuple of array (instead of array of tuple).
@mtb: MultiThread Broadcast --- perform broadcast with multiple threads.
@mtab: @mtb + @tab
@stb: force STructArray Broadcast --- it only works if user loads StructArrays.jl
@tab : support CuArray, OffsetArray, Tuple, StructArray, StaticArray
julia> a = randn(4000,4000);
julia> @tab b, c = sincos.(a);
julia> @tab b, c = broadcast(sincos,a);
julia> @tab b, c = broadcast(a) do x
sincos(x)
end;
julia> @tab b, c .= sincos.(a);
julia> broadcast!(sincos,(b,c),a);
- For
outputs <: AbstractArray
- Only the default
copy method which use similar(bc, T) is implemented, thus inputs like StaticArray is not allowed for non-inplace caluculation by default. We have an extension for @tab with StaticArrays.
@tab is not optimized for BitArray. The default return type is Array{Bool} for non-inplace broadcast.
- For
outputs <: Tuple, @tab first generate all results and then seperate them.
@tab is not designed for too many outputs.
@mtb : cpu multi-threads broadcast
julia> a = randn(4000,4000); b = similar(a);
julia> @btime @mtb @. $b = sin(a);
47.756 ms (22 allocations: 2.97 KiB)
julia> @btime @. $b = sin(a);
167.985 ms (2 allocations: 32 bytes)
julia> Threads.nthreads()
4
@mtb use CartesianPartition to seperate the task with dimension > 1
@mtb will be turned off automately for CuArray and Tuple
@mtb assume all elements in the dest array(s) are seperated in the memory and there's no thread safety check.
@mtb is not tuned for small arrays (It won't invoke the single thread version automately).
- User can change the number of threads by :
- Call
ExBroadcast.set_num_threads(n) for global change.
- Use 2 inputs macro
@mtb n [...] for local change. (thread safe)
@mtab only save some compile cost.