Python GIL rationale

Written by: Igor Korotach

Why do we need GIL?

In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.

 

The GIL prevents race conditions and ensures thread safety.

We know race conditions, right?

But why the hell we need thread safety?

Python Code

C Code

def main():
    num = 10
    num_reference = None
    # num_pointer = num
    print(num_reference)

if __name__ == '__main__':
    main()
#include <stdio.h>

int main() {
    int num = 10;
    int *num_pointer = NULL;
    // num_pointer = &num;
    printf("%d",*num_pointer);
    return 0;
}
None

Process finished with exit code 0
Process finished with exitcode 139
(interrupted by signal 11: SIGSEGV)

// SIGSEGV - segmentation fault

Example 1

Python Code

C Code

def main():
    array = [1, 2, 3]
    print(array[3])

if __name__ == '__main__':
    main()
#include <stdio.h>

int main(void) {
    int arr[2];
    arr[3] = 10;
  
    printf("%d", arr[3]);
    return 0;
}
Traceback (most recent call last):
  File ".../main.py", line 6, in <module>
    main()
  File ".../main.py", line 3, in main
    print(array[3])
          ~~~~~^^^
IndexError: list index out of range
70303616
Process finished with exit code 0

Example 2

Which Python things are prone to these problems?

Non-thread safe Python things

  • Reference counting
  • Memory allocator
  • Garbage collector
  • MRO (multiple resolution order)
  • Collections (list, dict, set)

The GIL Solution

# I protect all the GC, Reference Counting, Memory Allocation, etc.
mutex = GlobalPythonMutex()

Thread 1:

mutex.lock()
# I do a lot of Python work here!
mutex.unlock()

Thread 2:

mutex.lock()
# Now I do a lot of Python work!
mutex.unlock()

When GIL switches happen?

  1. Every 5 milliseconds on the safe bytecode boundary points
  2. When Stop-The-World Garbage Collector would like to do its stuff
  3. In C user-defined and interpreter native extensions
/* Implement pysleep() for various platforms.
   When interrupted
   (or when another error occurs),
   return -1 and
   set an exception; else return 0. */

static int
pysleep(_PyTime_t secs)
{
    // ...
    /* Allow sleep(0) to maintain win32
     * semantics, and as decreed
     * by Guido, only the main thread can
     * be interrupted.
     */
    ul_millis = (unsigned long)millisecs;
    if (ul_millis == 0 || !_PyOS_IsMainThread()) {
        Py_BEGIN_ALLOW_THREADS
        Sleep(ul_millis);
        Py_END_ALLOW_THREADS
        break;
    }
    // ...

    return 0;
}

Why GIL?

The GIL provides an important simplifying model of object access (including refcount manipulation) because it ensures that only one thread of execution can mutate Python objects at a time.

 

Moreover, synchronising objects to safely use them across parallel threads adds performance problems to single-threaded applications. 

Why GIL, TL;DR

You don't need to worry about safery if you don't use parallelism

Thanks for your attention. You've been awesome!

Questions?

  • Presentation link: https://slides.com/emulebest/python-gil-rationale

Python GIL rationale

By Igor Korotach

Python GIL rationale

  • 126