AOE 5984: Introduction to Parallel Computing Applications
OpenFOAM for CFD Applications
Lecture 2: Parallel Computing
Professor Eric Paterson
Aerospace and Ocean Engineering, Virginia Tech
14 November 2013
Lecture 2: Parallel Computing
-
Objectives
- Explain methodology of parallel simulation with OpenFOAM
- Identify tools for parallel computing
- Explore some of the options
- Outcomes
- Students will:
- know which tools to use for decomposing and recomposing data
- understand parallel-computing command-line options for OpenFOAM solvers and utilities
- understand domain-decomposition models available in OpenFOAM
- be able to run basic tutorials in both serial and parallel on BlueRidge
- appreciate the need for analysis and visualization of decomposed data
- be prepared to undertake OpenFOAM Homework #2
Motivation
Why do we want to use Parallel Computing?
Cases run faster
Run larger cases
Parallel Model
-
The method of parallel computing used by OpenFOAM, and CFD in general, is known as domain
decomposition.
-
Domain Decomposition Methods (DDM) are a specialized field of mathematics and computational science, http://www.ddm.org
-
In DDM, geometry and associated fields are broken into pieces and allocated to separate processors for solution.
Parallel Utilities
To support DDM in OpenFOAM, there are 4 parallel processing utilities:
[03:03:53][egp@egpMBP:parallelProcessing]542$ pwd
/Users/egp/OpenFOAM/OpenFOAM-2.2.x/applications/utilities/parallelProcessing
[03:04:09][egp@egpMBP:parallelProcessing]543$ ls -l
total 0
drwxr-xr-x 3 egp staff 918 May 17 09:51 decomposePar
drwxr-xr-x 3 egp staff 204 May 17 09:51 reconstructPar
drwxr-xr-x 3 egp staff 170 May 17 09:52 reconstructParMesh
drwxr-xr-x 3 egp staff 272 May 17 09:52 redistributePar
Process
- Case preparation
- Parallel executation
- Data analysis and visualization
- Reconstruction, only for special cases
Case Preparation
- Typically, the case will be prepared in one piece: serial mesh generation
-
decomposePar: parallel decomposition tool, controlled by the dictionary decomposeParDict
- Options in the dictionary allow choice of decomposition and auxiliary data
-
Upon decomposition, processorNN directories are created with decomposed
mesh and fields; solution controls, model choice and discretization parameters are
shared. Each CPU may use local disk space, however, on BlueRidge, disk space is shared across all compute nodes
-
decomposePar -cellDist writes cell-to-processor decomposition for visualization
decomposePar
- The goal of decomposition is to break up the domain with minimal effort but in such a way to guarantee a fairly economic solution (i.e., load balanced).
- The geometry and fields are decomposed according to a set of parameters specified in a dictionary named decomposeParDict that must be located in the system directory.
- In the decomposeParDict file the user must set the number of domains which
the case should be decomposed into: usually it corresponds to the number of
cores available for the calculation.
- For example, on BlueRidge, where we have 16-cores/node, a 4-node simulation would result in 64-processors and Subdomains
decomposePar
- The user has a choice of seven methods of decomposition, specified by the method keyword.
- For each method there are a set of coefficients specified in a sub-dictionary of decompositionDict, named
<method>Coeffs, used to instruct the decomposition
process:
- simple: simple geometric decomposition in which the domain is split into pieces by
direction, e.g. 2 pieces in the x direction, 1 in y etc.
- hierarchical: Hierarchical geometric decomposition which is the same as simple except
the user specifies the order in which the directional split is done, e.g. first in the
y-direction, then the x-direction etc.
decomposePar
- metis: METIS decomposition which requires no geometric input from the user and attempts to minimize the number of processor boundaries. The user can specify a weighting for the decomposition between processors which can be useful on machines with differing performance between processors.
-
scotch: similar technology as metis, and with a more flexible open-source license, http://www.labri.fr/perso/pelegrin/scotch/
- manual: Manual decomposition, where the user directly specifies the allocation of each cell to a particular processor.
- multilevel: similar to hierarchical, but all methods can be used in a nested form
- structured: special case.
pitzDailyParallel
-
copy pitzDaily tutorial and use as example
-
copy boilerplate decomposeDict from $FOAM_UTILITIES
[03:41:22][egp@brlogin1:simpleFoam]13058$ cp -rf pitzDaily pitzDailyParallel
[03:42:42][egp@brlogin1:pitzDailyParallel]13066$ cp $FOAM_UTILITIES/parallelProcessing/decomposePar/decomposeParDict system/.
pitzDaily Backward Facing Step tutorial
pitzDailyParallel
output from decomposePar
code text box below is scrollable. hover mouse, and scroll
[05:12:36][egp@brlogin1:pitzDailyParallel]13097$ decomposePar -cellDist
/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 2.2.0 |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 2.2.0
Exec : decomposePar -cellDist
Date : Nov 14 2013
Time : 05:12:46
Host : "brlogin1"
PID : 61396
Case : /home/egp/OpenFOAM/egp-2.2.0/run/tutorials/incompressible/simpleFoam/pitzDailyParallel
nProcs : 1
sigFpe : Floating point exception trapping - not supported on this platform
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time
Decomposing mesh region0
Removing 4 existing processor directories
Create mesh
Calculating distribution of cells
Selecting decompositionMethod scotch
Finished decomposition in 0.07 s
Calculating original mesh data
Distributing cells to processors
Distributing faces to processors
Distributing points to processors
Constructing processor meshes
Processor 0
Number of cells = 3056
Number of faces shared with processor 1 = 86
Number of faces shared with processor 2 = 60
Number of processor patches = 2
Number of processor faces = 146
Number of boundary faces = 6222
Processor 1
Number of cells = 3056
Number of faces shared with processor 0 = 86
Number of processor patches = 1
Number of processor faces = 86
Number of boundary faces = 6274
Processor 2
Number of cells = 3065
Number of faces shared with processor 0 = 60
Number of faces shared with processor 3 = 57
Number of processor patches = 2
Number of processor faces = 117
Number of boundary faces = 6237
Processor 3
Number of cells = 3048
Number of faces shared with processor 2 = 57
Number of processor patches = 1
Number of processor faces = 57
Number of boundary faces = 6277
Number of processor faces = 203
Max number of cells = 3065 (0.286299% above average 3056.25)
Max number of processor patches = 2 (33.3333% above average 1.5)
Max number of faces between processors = 146 (43.8424% above average 101.5)
Wrote decomposition to "/home/egp/OpenFOAM/egp-2.2.0/run/tutorials/incompressible/simpleFoam/pitzDailyParallel/constant/cellDecomposition" for use in manual decomposition.
Wrote decomposition as volScalarField to cellDist for use in postprocessing.
Time = 0
Processor 0: field transfer
Processor 1: field transfer
Processor 2: field transfer
Processor 3: field transfer
End.
Scotch Decomposition
method scotch;
scotchCoeffs
{
}
simple Decomposition
method simple;
simpleCoeffs
{
(2 2 1);
delta
n
0.001;
}
Parallel Execution
- Top-level code does not change between serial and parallel execution: operations
related to parallel support are embedded in the library
- Launch executable using mpirun with -parallel option
- Data in time directories is created on a per-processor basis
mpirun -np $PBS_NP simpleFoam -parallel 2>&1 | tee log.simpleFoam
- $PBS_NP is an environment variable that holds the number of requested cores
- mpirun is an application that launches the parallel job and farms out the tasks to the cores listed in $PBS_NODEFILE
Parallel Performance
-
In Homework #2, you will need to perform a parallel performance study.
- To increase the size of your mesh, edit the constant/polyMesh/blockMeshDict and increase the number in each direction. BE CAREFUL! This is fragile!
blocks
(
hex (0 6 7 1 22 28 29 23) (18 7 1) simpleGrading (0.5 1.8 1)
hex (1 7 8 2 23 29 30 24) (18 10 1) simpleGrading (0.5 4 1)
hex (2 8 9 3 24 30 31 25) (18 13 1) simpleGrading (0.5 0.25 1)
hex (4 10 11 5 26 32 33 27) (180 18 1) simpleGrading (4 1 1)
hex (5 11 12 6 27 33 34 28) (180 9 1) edgeGrading (4 4 4 4 0.5 1 1 0.5 1 1 1 1)
hex (6 12 13 7 28 34 35 29) (180 7 1) edgeGrading (4 4 4 4 1.8 1 1 1.8 1 1 1 1)
hex (7 13 14 8 29 35 36 30) (180 10 1) edgeGrading (4 4 4 4 4 1 1 4 1 1 1 1)
hex (8 14 15 9 30 36 37 31) (180 13 1) simpleGrading (4 0.25 1)
hex (10 16 17 11 32 38 39 33) (25 18 1) simpleGrading (2.5 1 1)
hex (11 17 18 12 33 39 40 34) (25 9 1) simpleGrading (2.5 1 1)
hex (12 18 19 13 34 40 41 35) (25 7 1) simpleGrading (2.5 1 1)
hex (13 19 20 14 35 41 42 36) (25 10 1) simpleGrading (2.5 1 1)
hex (14 20 21 15 36 42 43 37) (25 13 1) simpleGrading (2.5 0.25 1)
);
blockMesh set-up
-
blockMesh is a simple algebraic mesh generator in OpenFOAM
- It requires that you have a map of the point and blocks
- This is important when adjusting mesh size for parallel performance studies
-
To generate a new mesh, run blockMesh
Data Analysis
- Post-processing utilities, including sampling tools, will execute correctly in parallel, e.g.,
- mpirun -np $PBS_NP vorticity -parallel
- mpirun -np $PBS_NP Q -parallel
- mpirun -np $PBS_NP sample -parallel
- etc.
- This is VERY IMPORTANT for large simulations. You want to avoid reconstruction, because:
- It is time consuming
- It duplicates large datasets and uses a lot of disk space
sample utility for PitzDaily
- sample utility can do many things
- set sampling along lines and clouds
- surface sampling on planes, patches, isoSurfaces, and specified triangulated surfaces
- Allows you to focus on light-weight data vs. entire dataset!!
- To start, copy the boilerplate sampleDict from $FOAM_UTILITIES
- As an example, let's sample pitzDaily along the centerline and a vertical profile, and surface which is a uniform distance from the wall. Next slides describe the modifications.
-
Then run sample in parallel
cp $FOAM_UTILITIES/postProcessing/sampling/sample/sampleDict system/.
mpirun -np 4 sample -latestTime -parallel
sampleDict
sample velocity, pressure, and turbulence variables
fields
(
p
U
nut
k
epsilon
);
surfaces
(
nearWalls_interpolated
{
// Sample cell values off patch. Does not need to be the near-wall
// cell, can be arbitrarily far away.
type patchInternalField;
patches ( ".*Wall.*" );
interpolate true;
offsetMode normal;
distance 0.0001;
}
);
sampleDict, cont.
sets
(
centerline
{
type midPointAndFace;
axis x;
start (0.0 0.0 0.0);
end (0.3 0.0 0.0);
}
verticalProfile
{
type midPointAndFace;
axis y;
start (0.206 -0.03 0.0);
end (0.206 0.03 0.0);
}
);
Centerline plot
Vertical Profile plot
Near-wall surfaces
Axial velocity
Surface oil streamlines using line-integral convolution (LIC)
Visualization
-
Most common visualization tools read parallel OpenFOAM data: Paraview, Tecplot, Ensight, Fieldview. Again, avoid the use of reconstructPar
next lecture: more Data Analysis and Visualization