Multiprocessing in Python
David Taylor
"prooffreader"
data scientist, blogger, pythonista, nerd
thatsthejoke.jpg
Disclaimer
Serial computing
Do a thing
Do another thing
Do a third thing
Three things are done!
Parallel computing
Do a thing
Do another thing
Do a third thing
Three things are done!
START
START
SPLIT
COMBINE
Code
Interpreter
Program/Bytecode
Kernel
Process
Thread
.py
/usr/bin/python
.pyc
OS
RAM
Code
Interpreter
Program/Bytecode
Kernel
Process
Thread
.py
/usr/bin/python
.pyc
OS
RAM
Thread
Thread
Multithreading
Code
Interpreter
Program/Bytecode
Kernel
Process
Thread
.py
/usr/bin/python
.pyc
OS
RAM
Thread
Thread
Process
Process
Multiprocessing
CPUs
Python:
GIL
(Global Interpreter Lock)
Only one thread for most tasks
Therefore, only one process
multiprocessing module
(alternative: concurrent.futures in Py 3)
cores
time
(sec)
30,000 items
1 operation per number
30,000 items
20 operations per number
1,000,000 items
1 operation per number
CPython
vs.
Cython
(also PyPy/JIT/RPython)
Normal CPython code
Compiled CPython Code
56% faster
Compiled Cython Code
47X faster!
Libraries with
C binaries
e.g. Numpy
Multicomputing example
Desktop
pymongo, boto, pysftp
Digital Ocean
MongoDB server
AWS
20 instances +
Multicomputing example
Desktop
pymongo, boto, pysftp
Digital Ocean
MongoDB server
AWS
20 instances +
Epilogue:
PyParallel,
a proof of concept
Multiprocessing in Python
By David Taylor
Multiprocessing in Python
- 2,298