Parallel Computing
With Python
Jason Champion
@Xangis
github.com/Xangis
Why GPGPU?
Supercomputing with high performance per watt.
Physics simulations, medical, oil and gas exploration.
Weather, particle systems, gravity.
Gigaflops Performance
(Approximate)
- Intel Core 2 Duo: 20 GFLOPs
- Intel i5, AMD Phenom II x4: 50 GFLOPs
- Intel i7: 100 GFLOPs
- NVIDIA Geforce 630M: 300 GFLOPs
- NVIDIA Geforce 640, 650M, 750M: 700 GLFOPs
- Intel Xeon Phi: 1000-1200 GFLOPs
- NVIDIA GTX 760, AMD Radeon 6950: 2200 GFLOPS
- NVIDIA Geforce GTX Titan: 4500 GFLOPS
- AMD Radeon 7990: 8200 GFLOPS (face melt!)
Speed of the FASTEST Supercomputer
1993: 60 GFLOPs (~ Intel Core i5)
1995: 220 GFLOPs (~ Modern Dual Xeon)
1997: 1.3 TFLOPs (~ Intel Xeon Phi)
1999: 2.4 TFLOPs (~ Radeon 6950)
2002: 35 TFLOPs (~ Quad Radeon 7990)
2005: 280 TFLOPs
2008: 1 PFLOP
2010: 10 PFLOPs
2013: 33 PFLOPs
A $100 million 1998-era supercomputer can be had for $200.
Toys!
Intel Xeon Phi

Toys!
NVIDIA Geforce GTX Titan

Toys!
AMD Radeon 7990

Before GPGPU
Win32 Threads
The "dinner from a diaper" of parallelism. In C.
Being Replaced by C++ AMP, but still Windows-only.
Pthreads
"Old reliable" works great for non-GPU uses once you know it.
OpenMP
Makes life easy and in the multicore CPU world.
(If you're doing multiprocessor in notPython, it's well worth learning)
Early GPGPU
No dedicated computing language or APIs.
People used programmable graphics shaders
to perform calculations.
Slightly less fun than writing assembly language.
NVIDIA CUDA
Modified C programming language.
First general-purpose GPU computing API.
Only runs on NVIDIA hardware.
OpenCL
More complex than CUDA.
General-purpose C-based computing API.
Standards by the Khronos group (same as OpenGL, similar API).
Runs on CPUs and GPUs.
PyOpenCL was used for poclbm OpenCL bitcoin miner.
Python for Supercomputing
Fortran, C, and Assembly consistently benchmark as
the fastest programming languages.
Global interpreter lock prevents CPU-based
implementations of OpenMP from being good in Python.
Python is widely known as being slow,
so why use it for supercomputing?
Python for Supercomputing
It's optimized for the developer, and programmer time often costs more than CPU time.
Python has awesome libraries, especially for
data visualizaton.
Fast prototyping.
Can easily (-ish) port to C if more speed is needed
after the idea is proven, but you won't need to.
Prerequisites
AMD:
The AMD APP SDK
NVIDIA:
The NVIDIA CUDA SDK
Intel:
The Intel Xeon Phi SDK
Get them at the manufacturer website (see resources)
NumPy, SciPy (pip handles these)
Installing PyOpenCL
For the lazy Linux user:
sudo apt-get install python-pyopencl
It's on PyPI:
https://pypi.python.org/pypi/pyopencl
(pip install pyopencl)
Apple + AMD users are out of luck.
Pitfalls / Hassles
Ubuntu 13.10 w/NVIDIA Optimus (bumblebee) not awesome. One of the many reasons Linus gave NVIDIA the finger.

Installing PyCUDA
For the lazy Linux user:
sudo apt-get install python-pycuda
It's on PyPI:
https://pypi.python.org/pypi/pycuda
(pip install pycuda)
Trivial Example:
Mandelbrot Set Fractal
PyCUDA
[Source in Console]
From:
http://craneium.net/index.php?option=com_content&view=category&layout=blog&id=37&Itemid=97
Less Trivial Examples:
N-body Gravity Simulator
CUDA
Ocean Simulator
OpenCL
[See video]
Cryptocurrency on the GPU:
Don't Do It!
GPUs are much better for cryptographic calculations than CPUs.
Bitcoin doesn't use GPUs anymore, it uses custom hardware.
Litecoins and Feathercoins use GPU reasonably well.
Your return on investment, including hardware amortization and electricity costs, will be breakeven at best.
Just buy the coins on an exchange. Or create a better cryptocurrency that doesn't depend on environment-damaging power use.
OpenGL Interop
Your data is already on the GPU. Why not just render it?
Memory copies are suddenly not a problem.
That's what's happening with the ocean simulator.
Pitfalls
You have to know where your memory is.
You have to learn a good deal about GPU architecture to use it well.
You have to think way too much about what memory you're using and what data you're copying where.
The actual GPU programs are still in C.
Did I mention thinking too much about memory?
Resources
Lots of Books. Pretty much all of them focus on C:
CUDA by Example
The CUDA Handbook
OpenCL Programming Guide
Website of Andreas Klöcker, creator of PyCUDA and PyOpenCL
http://mathema.tician.de/software/
Good documentation and examples on the wiki.
This presentation and trivial test apps for CUDA and OpenCL:
https://slid.es/xangis/parallel-computing/ ; http://github.com/Xangis
More Resources
NVIDIA CUDA SDK:
https://developer.nvidia.com/cuda-downloads
LOTS of great code samples in the SDK.
AMD APP SDK (OpenCL):
http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/
Intel OpenCL SDK:
http://software.intel.com/en-us/vcsource/tools/opencl-sdk
Parallel Computing
By xangis
Parallel Computing
- 2,322