Trying No GIL on Scientific Programming

Get the slides at slides.com/cheukting_ho/trying-no-gil/

Cheuk Ting Ho

https://cheuk.dev

Do you know what is GIL?

What is GIL

Global Interperter Lock
Only a single operating system thread is used to run Python
Limit only one thread can access an object at a time
Imagine one thread is adding an object and another deleting it - lock is needed
Other programs may have multiple locks to do it but it is more complicated than GIL

What is No-gil Python

Clone of 3.9
4th attempt - Previous by Greg Stein (1996), Adam Olsen (2007) and Larry Hastings (2016)
by Sam Gross
Why no-gil => make use of multiple cores => SPEED

Design and challanges

Need to be good at both single-thread and multi-threads
Challenges - Reference counting - Bias reference counting
Make commonly used objects immortal - no ref count
Make some objects deferred ref counting - add counts at GC

Design and challanges

Challenges - thread safety for objects like dict and list
Using small locks
Manually write the lock orders using CPython API
replacement of Python’s built-in allocator pymalloc with mimalloc for thread safety
Need to stop the world for GC

How does it perform for scitific programs?

Most scientific packages have cpy modules, JIT compiler or Cython for speed up
Do programs benefit from no GIL?
Test it on some popular scientific processes

How to test it?

Try on something using pure Python
Try on something with Scikit-learn, NumPy and Scipy
Try on something about neural network
campare No Gil, original 3.9 and 3.11
Code I tested are on GitHub
Run experiment on GitHub action (reproducible)
cProfile report for extra investigation

Test #0 - Fibonacci

Generate first 25 numbers in Fibonacci sequence

- Average over 50 times

No GIL	CPython 3.9	CPython 3.11
0.0242614s	0.0452114s	0.0275933s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Significant improvement from 3.9
A bit better than 3.11

Test #1 - SVM

We use Recognizing hand-written digits

- Average over 50 times

No GIL	CPython 3.9	CPython 3.11
0.0327320s	0.0319601s	0.0295781s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

No significant difference

Test #2 - Clustering

We use A demo of K-Means clustering on the handwritten digits data - Average over 50 times

	No GIL	CPython 3.9	CPython 3.11
k-means++	0.230s	0.176s	0.188s
random	0.032s	0.024s	0.025s
PCA-based	0.015s	0.012s	0.012s

No significant difference (or worse)

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #3 - Decision Tree

We use the Iris data set in Plot the decision surface of decision trees trained on the iris dataset

- averaging all pairs of features

No GIL	CPython 3.9	CPython 3.11
0.397881ms	0.6451607ms	0.6741285ms
🥇	🥈	🥉

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #4 - Linear algebra

We use Linear algebra on n-dimensional arrays

- Average over 50 times

	No GIL	CPython 3.9	CPython 3.11
SVD	0.263492s	0.242731s	0.265867s
Norm	0.0235930s	0.0198416s	0.0237444s
Transpose	1.759529µs	1.850128µs	1.974106µs

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #5 - Image filters

We use X-ray image processing

- Average over 50 times

	No GIL	CPython 3.9	CPython 3.11
Laplacian-Gaussian	0.0335324s	0.0298902s	0.0309711s
Gaussian gradient magnitude	0.0711931s	0.0638475s	0.0655634s
Sobel filter	0.0835007s	0.0739401s	0.0758417s
Canny filter	0.0701143s	0.0669602s	0.0633507s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #6 - MLPClassifier

We use Compare Stochastic learning strategies for MLPClassifier

- Average over 10 times

No GIL	CPython 3.9	CPython 3.11
3.25005s	2.72408s	2.61342s
🥉	🥈	🥇

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Does it mean that no GIL does not help?

Why we didn't see much differnece

C extension processes already using multi-threads
C extensions may still expecting a GIL
It needs to adapt to no GIL mode
Compatibility can be an issue
Only comparing on dual-core (env dependent)
no GIL Python fork is still a work in progress

So what do we learnt?

Python is very versatile
There are different tools for different jobs
Creating a general strategy to solve all problems is impossible
Thank you for Sam and all the maintainers who are making Python and the tools we used better

Reference

Check these out!

Check out our booth and ...

if you want to be a leader in our community 👇

Made with Slides.com