Trying No GIL on Scientific Programming

Get the slides at slides.com/cheukting_ho/trying-no-gil/

Do you know what is GIL?

What is GIL

  • Global Interperter Lock
  • Only a single operating system thread is used to run Python
  • Limit only one thread can access an object at a time
  • Imagine one thread is adding an object and another deleting it - lock is needed
  • Other programs may have multiple locks to do it but it is more complicated than GIL

What is No-gil Python

  • Clone of 3.9
  • 4th attempt - Previous by Greg Stein (1996), Adam Olsen (2007) and Larry Hasting (2016)
  • by Sam Gross
  • Why no-gil => make use of multiple cores => SPEED

Design and challanges

  • Need to be good at both single-thread and multi-threads
  • Challenges - Reference counting - Bias reference counting
  • Make commonly used objects immortal - no ref count
  • Make some objects deferred ref counting - add counts at GC

Design and challanges

  • Challenges - thread safety for objects like dict and list
  • Using small locks
  • Manually write the lock orders using CPython API
  • replacement of Python’s built-in allocator pymalloc with mimalloc for thread safety
  • Need to stop the world for GC

How does it perform for scitific programs?

  • Most scientific packages have cpy modules or Cython for speed up
  • Do programs benefit from no GIL?
  • Test it on some popular scientific processes

How does it perform for scitific programs?

  • Try on something with Scikit-learn, NumPy and Scipy
  • Try on something about neural network
  • Not very scientific
  • campare No Gil, original 3.9 and 3.11
  • Code I tested are on GitHub

Test #1 - SVM

We use Recognizing hand-written digits

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
0.0313460s 0.0314445s 0.0318856s

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

No significant difference

Test #2 - Clustering

We use A demo of K-Means clustering on the handwritten digits data - Average over 50 times

No GIL CPython 3.9 CPython 3.11
k-means++ 0.060s 0.060s 0.060s
random 0.038s 0.035s 0.034s
PCA-based 0.015s 0.016s 0.014

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

No significant difference

Test #3 - Decision Tree

We use the Iris data set in Plot the decision surface of decision trees trained on the iris dataset 

- averaging all pairs of features

No GIL CPython 3.9 CPython 3.11
0.4832 - 0.6557ms 0.4588 - 0.5651ms 0.4669 - 0.5748ms
🥉 🥇 🥈

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

Test #4 - Linear algebra

We use Linear algebra on n-dimensional arrays

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
SVD 0.642902s 0.392427s 0.432239s
Norm 0.062048s 0.048787s 0.047184s
Transpose 2.574921s 3.476143s 1.983642s
🥈 🥉 🥇

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

Test #5 - Image filters

We use X-ray image processing

- Average over 50 times

No GIL CPython 3.9 CPython 3.11
Laplacian-Gaussian 0.033855s 0.034427s 0.035784s
Gaussian gradient magnitude 0.062575s 0.067955s 0.089838s
Sobel filter 0.062575s 0.063552s 0.061187s
Canny filter 0.065150s 0.060557s 0.066754s

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

Test #6 - MLPClassifier

We use Compare Stochastic learning strategies for MLPClassifier

- Average over 10 times

No GIL CPython 3.9 CPython 3.11
4.98540s 3.32853s 3.89133s
🥉 🥇 🥈

* run on my old MacBook Pro (Retina, 13-inch, Early 2015) with 3.1 GHz Dual-Core Intel Core i7

Does it mean that no GIL does not help?

Why we didn't see much differnece

  • C extension processes already using multi-threads
  • C extensions may still expecting a GIL
  • It needs to adapt to no GIL mode
  • Compatibility can be an issue
  • Only comparing on dual core
  • no GIL Python fork is still a work in progress

Reference

 Check these out!

Trying No GIL on Scientific Programming

By Cheuk Ting Ho

Trying No GIL on Scientific Programming

  • 139