The spectralDNS project
Associate Professor Mikael Mortensen
Department of Mathematics
Division of Mechanics
University of Oslo
PCCFD at KAUST 24/5 - 2017
Part I: Triply periodic domains/Fourier - parallel scaling
Part II: Automating the Spectral Galerkin method
The spectralDNS project
Started out as a toy project for
M. Mortensen and Hans Petter Langtangen
3D + time and HUGE resolution
Serious number crunching - extreme computing
We knew Direct Numerical Simulations (DNS) of turbulence were performed with low-level Fortran or C
Would it be possible to write a competitive, state of the art turbulence solver in Python?
From scratch?
(Not just wrapping a Fortran solver)
Using only numpy/scipy/mpi4py?
Can we make this solver run on thousands of cores ?
With billions of unknowns?
As fast as C?
What does it take to build a spectral DNS solver?
Triply periodic domain
Fourier space
Equations:
What does it take to build a spectral DNS solver?
Navier Stokes in spectral (Fourier) space:
Pseudo-spectral treatment of nonlinearity
Some timestepping routine (e.g., 4th order Runge-Kutta)
Classical pseudo-spectral Navier-Stokes solver
The kind that has been used to break records since the day of the first supercomputer
What does it really take to implement a spectral DNS solver?
Numpy is designed around structured arrays that can be reshaped, resized, sliced, transposed, etc.
Most array operations are performed in C and are very fast
(1) Structured grid
(2) 3D parallel FFT
Numpy wraps FFTpack
pyFFTW wraps FFTW
No MPI implementation
But MPI for Python (mpi4py, Lisandro Dalcin) wraps almost the entire MPI library and is as efficient as low-level C
Implementation
A few computational details
A computational box (mesh) needs to be distributed between a (possibly) large number of processors
Mesh size:
Pencil
Slab
CPUs
CPUs
3D FFT with MPI
in a nutshell
real FFT in z, complex in y
No MPI required
Transpose and distribute
Then FFT in last direction
- 2D FFTs in Y-Z planes (local)
- transpose operation
- distribute data (MPI Alltoall)
- FFT in X-direction
What does it really take?
4 operations, four lines of Python code!
Pencil decomposition
in a nutshell
rfft(u, axis=2)
Transform
fft(u, axis=0)
Transform
fft(u, axis=1)
7 operations
-
FFT z-dir
-
transpose
-
communicate
-
FFT y-dir
-
transpose
-
communicate
-
FFT x-dir
7 lines of code!
With parallel 3D FFT we may develop efficient DNS solvers
Enstrophy colored by pressure
which turns out to be very easy, and
100 lines of compact Python code is enough to implement
Ok, so it was relatively easy to implement
How about speed and parallel scaling?
At first it scaled poorly on my laptop
Then it scaled somewhat better on Norway's largest computer
But then Hans Petter contacted the Davids
Parallell scaling
Shaheen Blue Gene P supercomputer at KAUST Supercomputing Laboratory
Testing the 100 lines of pure Python code
C++(slab) 20-30 % faster than Python (slab)
Good strong scaling
mesh =
Why slower than C++?
Actually, the numpy ufuncs used to compute the cross product is the main reason
Moving cross product to Cython or Numba
Hardcoding the cross product into one single loop
@jit(float[:,:,:,:](float[:,:,:,:], float[:,:,:,:], float[:,:,:,:]), nopython=True)
def cross(a, b, c):
for i in xrange(a.shape[1]):
for j in xrange(a.shape[2]):
for k in xrange(a.shape[3]):
a0 = a[0,i,j,k]
a1 = a[1,i,j,k]
a2 = a[2,i,j,k]
b0 = b[0,i,j,k]
b1 = b[1,i,j,k]
b2 = b[2,i,j,k]
c[0,i,j,k] = a1*b2 - a2*b1
c[1,i,j,k] = a2*b0 - a0*b2
c[2,i,j,k] = a0*b1 - a1*b0
return c
Either Cython or Numba will all do the trick. A few lines more
Numba and Cython perform approximately the same
and as fast as C++!
Approximately 5 times faster than original
Solver optimized with Cython
More than matches C++ on the Blue Gene P!
And later on Shaheen II
mesh size = per task
65536 tasks!
mesh =
per task
Word of warning
2048 nodes, or 65536 tasks, take about 13 minutes to load Python and get started
Should have been using larger mesh
Publicly available packages for parallel FFT in
mpiFFT4py
mpi4py-fft
https://bitbucket.org/mpi4py/mpi4py-fft
With Lisandro Dalcin at KAUST
Completely generic implementation ~ 500 lines of Python code
Faster and more accurate pseudospectral DNS through adaptive high-order time integration
- Higher-order RK pairs reliably outperform RK4
- Step-size adaptivity can be a big win and is never disadvantageous
- Method of choice: Bogacki-Shampine 5/4 (BS5) with adaptivity
- David Ketcheson
- Nathanael Schilling
- Matteo Parsani
- Mikael Mortensen
SIAM CSE 2017
Part II:
Evolving from just Fourier and Navier Stokes
Shenfun
Automating the spectral Galerkin method
for tensor product domains
and non-periodic
basis functions
Prof. Jie Shen
Purdue University
meets chebfun
The Spectral Galerkin method
- Choose global basis (test) functions satisfying boundary conditions. We use modified Chebyshev or Legendre polynomials (J. Shen), or Fourier exponentials
- Transform PDEs to weak variational forms using the method of weighted residuals
- Solve system of equations
Solution
Expansion coefficients
(unknowns)
Test functions
Spectral Galerkin
Poisson example
1) Create a basis that satisfies boundary conditions, e.g.,
(Legendre polynomials)
2) Transform PDE to variational form
3) Solve linear system of equations
Trial function
Test function
Shen's modified basis leads to very sparse matrices
3 steps - Sounds easy?
Let's automate!
from fenics import *
mesh = UnitSquareMesh(8, 8)
V = FunctionSpace(mesh, 'CG', 1)
# Define variational problem
u = TrialFunction(V)
v = TestFunction(V)
f = Constant(-6.0)
a = inner(grad(u), grad(v))*dx
L = f*v*dx
Solve equations with code very similar to mathematics
However
- Finite accuracy (not spectral)
- Not optimal for simple tensor product grids or DNS
How to automate?
What do we know about
automating the solution
of PDE'S?
Shenfun
Python package (~3k loc) for automating the spectral Galerkin method
for high performance computing on tensor product grids
A FEniCS like interface with a simple form language
Extremely fast, order optimal, direct solvers for Poisson, Helmholtz and Biharmonic problems (sparse diagonal arrays)
MPI - invisible to the user mpi4py-fft (Lisandro Dalcin)
Fourier - Chebyshev - Legendre
modified basis functions
What else is there?
Chebfun
Spectral collocation, MATLAB, structured, no MPI
Semtex
Spectral element, MPI, C++/Fortran, 2D unstructured (1 Fourier)
Quite low-level, nowhere near as easily configured as FEniCS for new types of equations.
Very good for Navier-Stokes
Nek5000 (Nekbox)
Spectral element, Fortran, MPI, 3D unstructured
Nektar++
Spectral hp-element, C++, 3D unstructured
Nothing really for fast development of new spectral solvers aimed at high performance computing on supercomputers
Shenfun implementation of 1D Poisson problem
import numpy as np
from shenfun import TestFunction, TrialFunction, inner, div, grad, chebyshev
N = 20
V = chebyshev.ShenDirichletBasis(N, plan=True)
v = TestFunction(V)
u = TrialFunction(V)
# Assemble stiffness matrix
A = inner(v, div(grad(u)))
# Some consistent right hand side
f = Function(V, False)
f[:] = -2
# Assemble right hand side
f_hat = inner(v, f)
# Solve equations and transform spectral solution to real space
f_hat = A.solve(f_hat)
fj = V.backward(f_hat)
x = V.mesh(N)
assert np.allclose(fj, 1-x**2)
Very similar to FEniCS!
just with spectral accuracy
(1) Create basis
(2) Variational forms
(3) Solve system
2D/3D/4D.. Tensor product spaces
Combinations of Fourier and Chebyshev/Legendre bases
Any Cartesian combination of spaces (ultimately)
With solutions of the form:
Unknown expansion coefficients
Tensor product spaces
Combinations of Fourier and Chebyshev/Legendre bases
from shenfun import *
from mpi4py import MPI
comm = MPI.COMM_WORLD
N = (20, 30)
V0 = chebyshev.ShenDirichletBasis(N[0])
V1 = fourier.R2CBasis(N[1])
T = TensorProductSpace(comm, (V0, V1))
v = TestFunction(T)
u = TrialFunction(T)
A = inner(v, div(grad(u)))
# Or 3D
V2 = fourier.C2CBasis(40)
T3 = TensorProductSpace(comm, (V0, V2, V1))
v = TestFunction(T3)
u = TrialFunction(T3)
A = inner(v, div(grad(u)))
MPI under the hood
using mpi4py-fft
on tensor product grids like:
A simple form language
to make implementation very similar to mathematics
Mathamatical operators act on test, trial, or functions with data
V = TensorProductSpace(comm, (V0, V1, V2))
# Form arguments:
v = TestFunction(V)
u = TrialFunction(V)
u_ = Function(V, False)
# Forms
du = grad(u_)
d2u = div(du)
cu = curl(du)
dudx = Dx(u_, 0)
Inner products and projections
A = inner(v, u) # Bilinear form, assembles to matrices
f = inner(v, u_) # Linear form, assembles to vector
project(div(grad(u_)), V) # projection to V
Fast transforms
(not in FEniCS)
u_ = V.backward(u_hat, u_)
Any spectral function is defined as
3D:
1D:
This is computed on the tensor product grid as a backwards transform,
e.g., a discrete Fast Fourier inverse transform
A non-periodic basis uses fast Chebyshev(Legendre) transforms
Fast forward transforms
by L2-projections
u_hat = V.forward(u_, u_hat)
which is achieved using a forward transform
Implemented with FFTs or fast Chebyshev transforms
Vector spaces
VectorTensorProductSpace
import numpy as np
from shenfun import *
N = 16
K0 = C2CBasis(N)
K1 = C2CBasis(N)
K2 = R2CBasis(N)
T = TensorProductSpace(comm, (K0, K1, K2))
# Scalar function u_
u_ = Function(T, False)
u_hat = Function(T)
u_[:] = np.random.random(u_.shape)
u_hat = T.forward(u_, u_hat)
# Now define vector space to allow computation of grad(u_)
TV = VectorTensorProductSpace([T]*3)
du_hat = project(grad(u_), TV, uh_hat=u_hat)
Nonlinearities
are all treated with pseudospectral techniques
(1) Compute inverse
u = T.backward(u_hat)
with dealiasing using
3/2-rule (padding) or 2/3-rule
(3) Transform back to spectral space
w = u*u
(2) Compute nonlinearity
w_hat = T.forward(w)
T = TensorProductSpace(comm, (V0, V1, V2))
Complex Ginzburg-Landau
example
2D periodic (x, y) \( \in [-50, 50]^2 \)
Use Fourier in both directions
Weak form, only spatial discretization:
Complex Ginzburg-Landau
example
2D periodic (x, y) \( \in [-50, 50]^2 \)
import sympy
from shenfun.fourier.bases import R2CBasis, C2CBasis
from shenfun import inner, grad, TestFunction, TrialFunction, \
Function, TensorProductSpace
# Use sympy to set up initial condition
x, y = sympy.symbols("x,y")
ue = (x + y)*sympy.exp(-0.03*(x**2+y**2))
N = (200, 200)
K0 = C2CBasis(N[0], domain=(-50., 50.))
K1 = C2CBasis(N[1], domain=(-50., 50.))
T = TensorProductSpace(comm, (K0, K1))
u = TrialFunction(T)
v = TestFunction(T)
U = Function(T, False)
def compute_rhs(rhs, u_hat, U, T):
rhs = inner(grad(v), -grad(U), output_array=rhs, uh_hat=u_hat)
rhs /= (2*np.pi)**2
rhs += u_hat
rhs -= T.forward((1+1.5j)*U*abs(U)**2, w0)
return rhs
# And, e.g., Runge-Kutta integrator
#
#
Key code:
Navier Stokes
Turbulent channel flow
3D tensor product space
2 Fourier bases +
Modified Chebyshev bases for wall normal direction
Kim and Moin modified version of Navier-Stokes:
is convection
Wall normal velocity
Wall normal vorticity
Navier Stokes
Turbulent channel flow
First fully spectral DNS with
3D Tensor product mesh:
Shaheen II at KAUST
Thank you for your time!
http://github.com/spectralDNS
Before I go I just want to point you attention towards a conference organised in the memory of my brilliant colleague
International Conference on Computational Science and Engineering, Simula October 23-25, 2017
Questions?
Presented at the Predictive Complex Computational Fluid Dynamics conference 24/5-2017
By Mikael Mortensen
Presented at the Predictive Complex Computational Fluid Dynamics conference 24/5-2017
- 724