Parallel
Computing
Outline
-
What is parallel programming
-
Parallel computer
-
Flynn's classic taxonomy
-
Heterogeneous computing
-
-
Parallel programming model
-
Shared memory model
-
Distributed memory model
-
Hybrid model
-
Outline
-
What is parallel programming
-
Parallel computer
-
Flynn's classic taxonomy
-
Heterogeneous computing
-
-
Parallel programming model
-
Shared memory model
-
Distributed memory model
-
Hybrid model
-
-
What is parallel programming
Sequential Program
-
The single program is computed by one processor
-
Deal the instruction one after another
-
Only one instructin may execute at any moment
Parallel Program
-
Break a program into several parts
-
Instructions from each part execute simultanously
Why parallel programming
-
Advantages
Higher performance: save larger problem
Better resource utilization: taking advantage of multi-core processors
-
Disadvantages
Harder to program
Harder to debug
Not all problem can be parallelized efficiency ( dependency )
Parallel programs & application
-
Scientific applications
-
Computer animations
-
Computer games
-
Image processing
-
Data mining
Outline
-
What is parallel programming
-
Parallel computer
-
Flynn's classic taxonomy
-
Heterogeneous computing
-
-
Parallel programming model
-
Shared memory model
-
Distributed memory model
-
Hybrid model
-
-
Parallel computer
-
Flynn's classic taxonomy
-
Heterogeneous computing
-
Paralle computer classification
-
Flynn's Classical Taxonomy
-
Processing unit, instruction, data
-
SISD
-
MISD
-
SIMD
-
MIMD
SISD
-
Single Instruction, Single Data (SISD)
-
A serial (non-parallel) computer
-
Executes a single instruction stream, to operate on data stored in a single memory
-
Example: old mainframes,
single-core processor
SIMD
-
Single Instruction, Multiple Data (SIMD)
-
Multiple processing elements that perform the same operation on multiple data points concurrency
-
Example: GPU
vector pipelines computer
MISD
-
Multiple Instruction, Single Data (SIMD)
-
Many functional units perform different operations on the same data
-
Fault-tolerant computers execute
the same instructions to detect
and mask errors -
Example:
space shuttle
MIMD
-
Multiple Instruction, Multiple Data (MIMD)
-
At any time, different processors may be executing different instructions on different data
-
Example:
Most modern computers,
multi-core PCs
supercomputer
cluster
Heterogenous Computing
-
Heterogeneous computing is an integrated system that consists of different types of (programmable) computing units
-
DSP (digital signal processor)
-
FPGA (field-programmable gate array)
-
ASIC (application-specific integrated circuit)
-
GPU (graphics processing unit)
-
Co-processor (Intel Xeon Phi)
-
CPU v.s GPU
-
CPU is latency oriented design, can do lots of sophisticated control
-
GPU is throughput oriented design, long latency but heavily pipeline for high throughput
Latency v.s Thoughtput
Performance
Trend
Outline
-
What is parallel programming
-
Parallel computer
-
Flynn's classic taxonomy
-
Heterogeneous computing
-
-
Parallel programming model
-
Shared memory model
-
Distributed memory model
-
Hybrid model
-
-
Parallel programming model
Shared memory model
Distributed memory model
Hybrid model
Shared Memory Model
-
Memory can be simultaneously access by multiple process with an intent to provide communication among them or avoid redundant data copies
Shared Memory/Thread Model
-
A single process can have multiple, concurrent execution paths
-
Threads have local data, but also, shares resources
-
Threads commucnication through global memory
-
Thread can come and go, but the main program remains
-
to provide the necessary shared resources until the application has complete
-
Shared Memory/Thread Model
Shared Memory/Thread Model
-
Implementation methodology
-
A library of subroutines called from parallel source code
e.g: POSIX Thread (Pthread)
-
-
A set of compiler directives embedded in either serial or parallel source code
-
e.g: OpenMP
-
Shared Memory/Thread Model
-
Important issues
-
Race Condition: A situation where the computing output depending on the sequence order of process executions
-
Deadlock: Two or more competing action are waiting for the other to finish
-
Distributed Memory/MPI Model
-
A set of tasks that use their own local memory during computaion
-
Tasks exchange data through communications by sending and receive messages
-
Memory copy
-
-
Implementation: MPI
-
An API specification that allows computers to communicate by means send, receive, broadcast ... etc
-
Distributed Memory/MPI Model
Distributed Memory/MPI Model
-
Important issues
-
Synchronization
programmer should make sure the correctness of timing dependency between processes -
Communication time
Network speed is much slower than CPU speed
Network latency causes a constant delay time
-
Hybrid Parallel computing Model
-
Combine both shared & distributed memory
-
MPI + pthread/OpenMP
-
Implement parallelism with MPI libraries among nodes
-
Implement parallelism with pthread/OpenMP libraries within each node
-
Q & A
Parallel
By zlsh80826
Parallel
- 778