Multi-CPU and Multi-GPU Parallelism with Dagger.jl
Julian Samaroo (MIT JuliaLab)
What Developers Want
Multi-Threaded Parallelism
Multi-GPU Parallelism
Easy, Intuitive Parallel APIs
Benefits
Challenges
Multi-Threaded Parallelism
Benefits
Challenges
GPU Parallelism
Benefits
Challenges
Existing APIs
But can we have all three?
Enter stage left: Dagger.jl
Don't reinvent the wheel - build simple, consistent APIs on a solid heterogeneous foundation, complete with a task runtime and scheduler, and...
Model Everything
What to model:
Model Everything?
What this gets us
Lots of parallelism
Performance Scalability
Show me the code!
# Cholesky
Dagger.spawn_datadeps() do
for k in range(1, mt)
Dagger.@spawn LAPACK.potrf!('L', ReadWrite(M[k, k]))
for m in range(k+1, mt)
Dagger.@spawn BLAS.trsm!('R', 'L', 'T', 'N', 1.0,
Read(M[k, k]), ReadWrite(M[m, k]))
end
for n in range(k+1, nt)
Dagger.@spawn BLAS.syrk!('L', 'N', -1.0,
Read(M[n, k]), 1.0,
ReadWrite(M[n, n]))
for m in range(n+1, mt)
Dagger.@spawn BLAS.gemm!('N', 'T', -1.0,
Read(M[m, k]), Read(M[n, k]),
1.0, ReadWrite(M[m, n]))
end
end
end
end
Show me the code! (Explained)
# Start a "Datadeps region"
Dagger.spawn_datadeps() do
...
end
Show me the code! (Explained)
Dagger.spawn_datadeps() do
# Launch some tasks
for k in range(1, mt)
Dagger.@spawn LAPACK.potrf!(...)
end
end
Show me the code! (Explained)
# Specify our "data dependencies"
LAPACK.potrf!('L', ReadWrite(M[k, k]))
BLAS.gemm!('N', 'T', -1.0,
Read(M[m, k]), Read(M[n, k]),
1.0, ReadWrite(M[m, n]))
Show me the code! (Explained)
# Use a single CUDA GPU
using DaggerGPU, CUDA
scope = Dagger.scope(cuda_gpu=1)
Dagger.with_options(;scope) do
Dagger.spawn_datadeps() do
...
end
end
Show me the code! (Explained)
# Use two AMD GPUs
using DaggerGPU, AMDGPU
scope = Dagger.scope(rocm_gpus=[1,2])
Dagger.with_options(;scope) do
Dagger.spawn_datadeps() do
...
end
end
Show me the code! (Explained)
# Call a GPU-parallel Cholesky
using Dagger, DaggerGPU, Metal
A = [...]
DA = view(A, AutoBlocks())::DArray
scope = Dagger.scope(;metal_gpu=1)
C = Dagger.with_options(;scope) do
# cholesky(::DArray) uses Datadeps internally
cholesky(DA)
end
What Dagger generates
Upstream:
To be merged:
More to come!