Unified Task Parallelism
Julian Samaroo (MIT)
What if every function you called was parallel and scalable?
What if every function you called was ready and able to run on GPUs?
What if every function you called adapted itself to run optimally on your hardware?
What if you didn't have to rewrite your code each time to run on one thread, 100 threads, or 16 GPUs?
What if this library already exists in Julia?
What is this vision, really?
Building a scalable, heterogeneous computing library that has all the APIs users need, with a sensible and consistent design, that all builds on a single simple task API.
A show of hands
This is easy with Dagger.jl
# Cholesky
Dagger.spawn_datadeps() do
for k in range(1, mt)
Dagger.@spawn LAPACK.potrf!(
'L',
ReadWrite(M[k, k]))
for m in range(k+1, mt)
Dagger.@spawn BLAS.trsm!(
'R', 'L', 'T', 'N', 1.0,
Read(M[k, k]), ReadWrite(M[m, k]))
end
for n in range(k+1, nt)
Dagger.@spawn BLAS.syrk!(
'L', 'N', -1.0,
Read(M[n, k]), 1.0, ReadWrite(M[n, n]))
for m in range(n+1, mt)
Dagger.@spawn BLAS.gemm!(
'N', 'T', -1.0,
Read(M[m, k]), Read(M[n, k]), 1.0, ReadWrite(M[m, n]))
end
end
end
end
# Start an SPMD region with threads
X_all = spmd(Threads.nthreads()) do
# Have a rank per thread
rank = spmd_rank()
X = rand(4,4)
for iter in 1:niters
# Do a local thing on each rank
X .*= 3
# Do a collective op across all ranks
spmd_reduce!(+, X)
end
return X
end
# Distribution Analysis
function analysis(dists, lens, K=1000)
res = DataFrame()
@sync for T in dists
dist = T()
σ = Dagger.@spawn std(dist)
for L in lens
z = Dagger.@spawn max_mean(
dist, L, K, σ)
push!(res, (;T, σ, L, z))
end
end
mapcols!(col->fetch.(col), res)
return res
end
# Allocate a DArray
A = rand(AutoBlocks(), 1024, 1024)
# Matmul
B = A * A
# Broadcast
C = B .* A ./ 3
# QR
D = qr(C).U
# Triangular solve
X = rand(AutoBlocks(), 1024)
ldiv!(D, X)
# Broadcast (in-place)
X .+= 2
Tasks
Arrays
Datadeps
SPMD
APIs
Arrays
Data Flow
Acceleration
Dagger meets you at your problem
Devices
Measured Metrics
Dagger knows your hardware
Features
Dagger is convenient
Why not that other parallelism package XYZ.jl?
Barriers to adoption
Give Dagger a try for your problem, and reach out if you have any trouble! And, contributions are always welcome :)