Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
201 Stars
Updated Last
7 Months Ago
Started In
December 2020


Documentation (stable) Documentation (dev) Continuous Integration Continuous Integration (Julia nightly) Code Coverage

To make sure CPUSummary 1.11 and newer are using Hwloc, you may want to run

julia> using CPUSummary

julia> CPUSummary.use_hwloc(true);

which will hopefully enable accurate hardware information. This is the default, so it should typically be unnecessary.

Octavian.jl is a multi-threaded BLAS-like library that provides pure Julia matrix multiplication on the CPU, built on top of LoopVectorization.jl.

Please see the Octavian documentation.

Octavian dropped 32bit Julia support. See PR#157. If you're interested in restoring it, please file a PR to fix failing tests.


You can run benchmarks using BLASBenchmarksCPU.jl:

julia> @time using BLASBenchmarksCPU
  7.278954 seconds (17.59 M allocations: 1.107 GiB, 6.22% gc time)

julia> rb = runbench(sizes = logspace(10, 1_000, 200)); plot(rb, displayplot = false);
Progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 2:25:04
  Size:               (1000, 1000, 1000)
  BLIS:               (MedianGFLOPS = 1051.0, MaxGFLOPS = 1476.0)
  Gaius:              (MedianGFLOPS = 765.8, MaxGFLOPS = 941.7)
  MKL:                (MedianGFLOPS = 1348.0, MaxGFLOPS = 1589.0)
  Octavian:           (MedianGFLOPS = 1816.0, MaxGFLOPS = 1895.0)
  OpenBLAS:           (MedianGFLOPS = 1254.0, MaxGFLOPS = 1385.0)
  Tullio:             (MedianGFLOPS = 1102.0, MaxGFLOPS = 1196.0)
  LoopVectorization:  (MedianGFLOPS = 1552.0, MaxGFLOPS = 1721.0)

julia> versioninfo()
Julia Version 1.7.0-DEV.1124
Commit d18cf93bac* (2021-05-19 16:11 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, cascadelake)

Resulted in the following: octavian10980xebench

Related Packages

Julia Package CPU GPU
Gaius.jl Yes No
GemmKernels.jl No Yes
Octavian.jl Yes No
Tullio.jl Yes Yes

In general:

  • Octavian has the fastest CPU performance.
  • GemmKernels has the fastest GPU performance.
  • Tullio is the most flexible.