Multiprocessing in Python

David Taylor

"prooffreader"

data scientist, blogger, pythonista, nerd

thatsthejoke.jpg

Disclaimer

Serial computing

Do a thing

Do another thing

Do a third thing

Three things are done!

Parallel computing

Do a thing

Do another thing

Do a third thing

Three things are done!

START

START

SPLIT
COMBINE

Code

Interpreter

Program/Bytecode

Kernel

Process

Thread

.py
/usr/bin/python
.pyc
OS
RAM

Code

Interpreter

Program/Bytecode

Kernel

Process

Thread

.py
/usr/bin/python
.pyc
OS
RAM

Thread

Thread

Multithreading

Code

Interpreter

Program/Bytecode

Kernel

Process

Thread

.py
/usr/bin/python
.pyc
OS
RAM

Thread

Thread

Process

Process

Multiprocessing

CPUs

Python:

 

GIL

(Global Interpreter Lock)

 

Only one thread for most tasks

Therefore, only one process

multiprocessing module

(alternative: concurrent.futures in Py 3)

cores

time

(sec)

30,000 items

1 operation per number

30,000 items

20 operations per number

1,000,000 items

1 operation per number

CPython

vs.

Cython

 

(also PyPy/JIT/RPython)

Normal CPython code

Compiled CPython Code

56% faster

Compiled Cython Code

47X faster!

Libraries with
C binaries

e.g. Numpy

Multicomputing example

Desktop

pymongo, boto, pysftp

Digital Ocean

MongoDB server

AWS

20 instances +

Multicomputing example

Desktop

pymongo, boto, pysftp

Digital Ocean

MongoDB server

AWS

20 instances +

Epilogue:

 

PyParallel,

a proof of concept

Multiprocessing in Python

By David Taylor

Multiprocessing in Python

  • 2,172