Python's

Parallel

Processing

Possibilities

Samuel Colvin

Who Am I

  • Built and run TutorCruncher - SAAS platform with monolithic django app on heroku

  • Contributor to aiohttp, rq and many more

  • Developer of arq, pydantic and many more

Agenda

I'll try to:

  • Talk about 4 levels of concurrency
  • Demonstrate them using Python
  • Why you might (not) use them
  • Make it fun

 

I wont:

  • Prepare you for a CS exam on distributed computing
  • Go into detail on protocols
  • Give an exhaustive description of the technology

source: www.spec.org, github.com/samuelcolvin/analyze-spec-benchmarks

The rational for Parallel Processing

The Metaphor

1. Multiple Machines

Machine = host/computer/virtual machine/container

import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())}')

RQ

Example

worker.py

from redis import Redis
from rq import Queue
from worker import count_words

q = Queue(connection=Redis())
for year in range(2016, 2020):
    print(q.enqueue(count_words, year))

rq_example.py

Multiple Machines - Advantages

  • Scaling is easy
  • Linear cost increase
  • Isolation!

Multiple Machines - Disadvantages

  • Need to take care of networking between the machines
  • Harder to setup in a dev environment
  • No standard library implementation

2. Multiple Processes

  • Processes are an Operating System concept
  • Exist (with a little variation) on all OSes
  • Often used as a stop gap for multiple machines during testing

Processes

Example

from multiprocessing import Process, JoinableQueue
import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())} words')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting worker', id)
            break
        count_words(item)
        q.task_done()

q = JoinableQueue()
process = []
for id in range(2):
    p = Process(target=worker, args=(id,))
    p.start()
    process.append(p)

for year in range(2016, 2020):
    q.put(year)

q.join()

for _ in process:
    q.put(None)
for p in process:
    p.join()
➤ python multiprocessing_example.py 
2017: 4123 words
2016: 3794 words
2019: 1953 words
2018: 4334 words
quitting worker 0
quitting worker 1
  • Easy to run
  • OS guarantees memory separate between processes
  • Fast to communicate

Multiple Processes - Advantages

Multiple Processes - Disadvantages

  • Limits to scaling
  • Fixed capacity

3. Multiple Threads

  • Threads allow concurrent execution from within a single process
  • Thus multiple threads can access the same memory
  • 2 varieties: kernel threads, user/green threads
  • "Threading" in python generally refers to kernel threads

Threading

Example

from queue import Queue
from threading import Thread
import requests

def count_words(year: int):
    resp = requests.get(f'https://ep{year}.europython.eu/en/')
    print(f'{year}: {len(resp.text.split())}')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        count_words(item)
        q.task_done()

q = Queue()
threads = []
for id in range(2):
    t = Thread(target=worker, args=(id,))
    t.start()
    threads.append(t)

for year in range(2016, 2020):
    q.put(year)

q.join()

for _ in threads:
    q.put(None)
for t in threads:
    t.join()
➤ python threading_example.py 
2017: 4123 words
2016: 3794 words
2019: 1953 words
2018: 4334 words
quitting worker 0
quitting worker 1

Multiple Threads - Advantages

  • Lighter than processes
  • Faster to create and switch than processes
  • Share memory (if you dare!)

Multiple Threads - Disadvantages

Memory locking is horrid -

The GIL limits the usefulness of threading with Python:

Do not communicate by sharing memory; instead, share memory by communicating.

- Go Proverb

GIL ... protects access to Python objects, preventing multiple threads from executing Python bytecodes at once

- Python Wiki

The Global Interpreter Lock

from queue import Queue
from threading import Thread
from time import time


def do_calcs(year: int):
    print(sum(range(year * int(1e5))))

t1 = time()
for year in range(2016, 2020):
    do_calcs(year)
t2 = time()
print(f'Time taken without threads: {t2 - t1:0.2f}s')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        do_calcs(item)
        q.task_done()

t3 = time()
...

for year in range(2016, 2020):
    q.put(year)

...
t4 = time()
print(f'Time taken with 2 threads:  {t4 - t3:0.2f}s')
➤  python gil.py 
20321279899200000
20341444899150000
20361619899100000
20381804899050000
Time taken without threads: 7.63s
20321279899200000
20341444899150000
20361619899100000
20381804899050000
quitting thread 1
quitting thread 0
Time taken with 2 threads:  7.65s
from queue import Queue
from threading import Thread
from time import time
import numpy as np

def do_calcs(year: int):
    print(np.sum(np.arange(year * int(1e5))))

t1 = time()
for year in range(2016, 2020):
    do_calcs(year)
t2 = time()
print(f'Time taken without threads: {t2 - t1:0.2f}s')

def worker(id):
    while True:
        item = q.get()
        if item is None:
            print('quitting thread', id)
            break
        do_calcs(item)
        q.task_done()

t3 = time()
...

for year in range(2016, 2020):
    q.put(year)

...
t4 = time()
print(f'Time taken with 2 threads:  {t4 - t3:0.2f}s')
➤  python gil_numpy.py 
20321279899200000
20341444899150000
20361619899100000
20381804899050000
Time taken without threads: 2.36s
20321279899200000
20341444899150000
20381804899050000
20361619899100000
quitting thread 1
quitting thread 0
Time taken with 2 threads:  1.34s

4. Asynchronous I/O

  • AKA coroutines/green threads/fibers
  • "Asyncio" in python
  • Cooperative scheduling
  • Mostly (but not always) used for networking tasks
  • based on an event loop which schedules coroutines
  • 1 kernel thread - only one piece of code is running at any time

Without

Asyncio

With

Asyncio

Asyncio Example

from aiohttp import ClientSession
import asyncio


async def count_words(year: int):
    async with ClientSession() as session:
        async with session.get(f'https://ep{year}.europython.eu/en/') as resp:
            text = await resp.text()
    print(f'{year}: {len(text.split())} words')


async def main():
    coroutines = []
    for year in range(2016, 2020):
        coroutines.append(count_words(year))
    await asyncio.gather(*coroutines)


asyncio.run(main())
➤ python asyncio_example.py 
2019: 1953 words
2017: 4123 words
2016: 3782 words
2018: 4334 words

Asyncio - Advantages

  • Even lighter - easily run thousands of concurrent tasks
  • Easier to reason with
  • Less risk of memory corruption

Asyncio - Disadvantages

  • By default asyncio provides no speedup for CPU bound tasks
  • Whole new way of thinking
  • Applications have to be entirely rewritten

explicit cooperative scheduling is awesome, but it can't be implicit

- me

This is where it gets tricky

Machines

Processes

Threads

Asyncio

rq forks the main process to run the worker

ThreadPoolExecutor

ProcessPoolExecutor

aiohttp, arq

multiprocessing.Queue

Using Asyncio for Processes and Threading

  • performance of processes or threads from the conform of asyncio
  • ThreadPoolExecutor - for file operations 
  • ProcessPoolExecutor - for CPU bound tasks

ThreadPoolExecutor Example

from concurrent.futures import ThreadPoolExecutor
import asyncio
from time import time
import numpy as np

def do_calcs(year: int):
    print(np.sum(np.arange(year * int(1e5))))

async def main():
    loop = asyncio.get_event_loop()
    with ThreadPoolExecutor(max_workers=2) as pool:
        coroutines = [
            loop.run_in_executor(pool, do_calcs, v)
            for v in range(2016, 2020)
        ]
        await asyncio.gather(*coroutines)

t1 = time()
asyncio.run(main())
print(f'Time taken with 2 threads:  {time() - t1:0.2f}s')
➤  python asyncio_numpy.py 
20321279899200000
20341444899150000
20381804899050000
20361619899100000
Time taken with 2 threads:  1.27s

Summary

  • 4 levels of concurrency: machines, processes, threads, asyncio
  • All possible with (but not limited to) python 
  • All have strengths, weaknesses and pitfalls
  • They often interact with each other

 

It's easy to read the docs but the tricky thing (and what I tried to do today) is understanding the big picture

Thank you

checkout:

arq - asyncio version of rq with superpowers

devtools - better debug printing for python

pydantic - Data parsing using Python type hinting

 

 

github.com/samuelcolvin/(arq|python-devtools|pydantic)

Questions?

this presentation: tiny.cc/pythonsppp

PythonsPPP

By Samuel Colvin

PythonsPPP

  • 2,704