Trying No GIL on Scientific Programming

Get the slides at slides.com/cheukting_ho/trying-no-gil/

Do you know what is GIL?

What is GIL

  • Global Interperter Lock
  • Only a single operating system thread is used to run Python
  • Limit only one thread can access an object at a time
  • Imagine one thread is adding an object and another deleting it - lock is needed
  • Other programs may have multiple locks to do it but it is more complicated than GIL

What is No-gil Python

  • Clone of 3.9
  • 4th attempt - Previous by Greg Stein (1996), Adam Olsen (2007) and Larry Hastings (2016)
  • by Sam Gross
  • Why no-gil => make use of multiple cores => SPEED

Design and challanges

  • Need to be good at both single-thread and multi-threads
  • Challenges - Reference counting - Bias reference counting
  • Make commonly used objects immortal - no ref count
  • Make some objects deferred ref counting - add counts at GC

Design and challanges

  • Challenges - thread safety for objects like dict and list
  • Using small locks
  • Manually write the lock orders using CPython API
  • replacement of Python’s built-in allocator pymalloc with mimalloc for thread safety
  • Need to stop the world for GC

How does it perform for scitific programs?

  • Most scientific packages have cpy modules, JIT compiler or Cython for speed up
  • Do programs benefit from no GIL?
  • Test it on some popular scientific processes

How to test it?

  • Try on something using pure Python
  • Try on something with Scikit-learn, NumPy and Scipy
  • Try on something about neural network
  • campare No Gil, original 3.9 and 3.11
  • Code I tested are on GitHub
  • Run experiment on GitHub action (reproducible)
  • cProfile report for extra investigation

Test #0 - Fibonacci

Generate first 25 numbers in Fibonacci sequence

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
0.0242614s 0.0452114s 0.0275933s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Significant improvement from 3.9
A bit better than 3.11

Test #1 - SVM

We use Recognizing hand-written digits

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
0.0327320s 0.0319601s 0.0295781s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

No significant difference

Test #2 - Clustering

We use A demo of K-Means clustering on the handwritten digits data - Average over 50 times

No GIL CPython 3.9 CPython 3.11
k-means++ 0.230s 0.176s 0.188s
random 0.032s 0.024s 0.025s
PCA-based 0.015s 0.012s 0.012s

No significant difference (or worse)

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #3 - Decision Tree

We use the Iris data set in Plot the decision surface of decision trees trained on the iris dataset 

- averaging all pairs of features

No GIL CPython 3.9 CPython 3.11
0.397881ms 0.6451607ms 0.6741285ms
🥇 🥈 🥉

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #4 - Linear algebra

We use Linear algebra on n-dimensional arrays

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
SVD 0.263492s 0.242731s 0.265867s
Norm 0.0235930s 0.0198416s 0.0237444s
Transpose  1.759529µs 1.850128µs 1.974106µs

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #5 - Image filters

We use X-ray image processing

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
Laplacian-Gaussian 0.0335324s 0.0298902s 0.0309711s
Gaussian gradient magnitude 0.0711931s 0.0638475s 0.0655634s
Sobel filter 0.0835007s 0.0739401s 0.0758417s
Canny filter 0.0701143s 0.0669602s 0.0633507s

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Test #6 - MLPClassifier

We use Compare Stochastic learning strategies for MLPClassifier

- Average over 10 times

No GIL CPython 3.9 CPython 3.11
3.25005s 2.72408s 2.61342s
🥉 🥈 🥇

* run on GitHub Action ubuntu-latest (Ubuntu 22.04) with 2 cores

Does it mean that no GIL does not help?

Why we didn't see much differnece

  • C extension processes already using multi-threads
  • C extensions may still expecting a GIL
  • It needs to adapt to no GIL mode
  • Compatibility can be an issue
  • Only comparing on dual-core (env dependent)
  • no GIL Python fork is still a work in progress

So what do we learnt?

  • Python is very versatile
  • There are different tools for different jobs
  • Creating a general strategy to solve all problems is impossible
  • Thank you for Sam and all the maintainers who are making Python and the tools we used better

Reference

 Check these out!

Check out our booth and ... 

 

if you want to be a leader in our community 👇