{The Bridges between Python and C/C++}
- 这个人讲话中英文掺杂,so do these slides
- 我说的也不一定对,有问题请随时打断我
- 本来想讲的是pybind11 用法和实现,写slides的时候逐渐暴躁。。。
- You can find code shown in this slides here.
Before we start...

- What's Python?
- Python's ecosystem
- Extending Python using C/C++
- Embedding Python in C/C++ apps
- Speedup Python
We will Try to Cover...
What the hell is python?
Python: a programming language with a defined syntax and specification
python: a program
Namely the default CPython interpreter that processes Python source code and serves as the reference implementation.


Late 1980s
Conceived Python
1991
Python 0.9.0 released, first implementation
1994
Python 1.0 released, including FP lambdas, map, filter, reduce
2008
Python3 fixes fundamental design flaws and is not backward compatible
2000
Python 2.0 introduced list comprehension and garbage collection
2020
Python2 EOL
Python 历史
-
Simple syntax: easy to learn and use
-
Interpreted: your code is evaluated line by line
-
Rich ecosystem
Why is Python so popular?



Compiled Language
Interpreted Language
"Python is an interpreted language"
To be precise, it just hides compilation from you

python (CPython interpreter) is implemented in C!

Extending Python
-
CPython is accompanied by an API that enables communication between Python and C (and basically any language that C can interop with).
-
Extending Python: the process of producing compiled code for use by CPython.
-
The compiled pieces of code to be used are known as "extension modules".
-
.so on Linux/Mac
-
.pyd on Windows
-
Solution | Brief Description |
---|---|
Python C API | Native C API provided by CPython |
CFFI/ctypes | Foreign function library, calling C APIs from Python |
SWIG | Reads C/C++ header files can generates a library that Python can load |
pybind11/nanobind/boost.python | Write C++ libraries that Python can import and use easily |
import my_module
my_module.add(1, 2)
Extend Python with C/C++
CFFI/ctypes: call C functions from Python
(not an extension module, not Python-aware, pure .so)
import ctypes sum_from_ctypes = ctypes.CDLL('libsum_from_ctypes.so') sum_from_ctypes.my_sum.argtypes = (ctypes.c_int, ctypes.POINTER(ctypes.c_int)) N = 10 data = [1] * N n = ctypes.c_int(N) arr_type = ctypes.c_int * N data = arr_type(*data) sum_from_ctypes(sum_from_ctypes.my_sum, data, n)
import ctypes sum_from_ctypes = ctypes.CDLL('libsum_from_ctypes.so') sum_from_ctypes.my_sum.argtypes = (ctypes.c_int, ctypes.POINTER(ctypes.c_int)) N = 10 data = [1] * N n = ctypes.c_int(N) arr_type = ctypes.c_int * N data = arr_type(*data) sum_from_ctypes(sum_from_ctypes.my_sum, data, n)
int my_sum(int num_numbers, int *numbers) { int i; int sum = 0; for (i = 0; i < num_numbers; i++) { sum += numbers[i]; } return sum; }
cmake_minimum_required(VERSION 3.4...3.18) project(sum) add_library(sum_from_ctypes MODULE src/sum.c)
These boilerplate code can be generated from C headers
using ctypesgen
Why not just use C API?
Sure that works!
A SUM Module written in C API
# To build: # python setup.py build && pip install -e . import sum_from_cpython data = [1] * 10 sum_from_cpython.my_sum(data)
# PRESENTING CODE
# In setup.py from distutils.core import setup, Extension cpython_mod = Extension('sum_from_cpython', sources = ['src/sum_cpython.c']) setup (name = 'demo', version = '0.0', description = 'This is a demo package', ext_modules = [cpython_mod])
Let's unwrap this step by step...
Define a Module in C API
// File sum_cpython.c #include <stdio.h> #include <Python.h> // Module definition // The arguments of this structure tell Python what to call your extension, // what it's methods are and where to look for it's method definitions static struct PyModuleDef sum_from_cpython_definition = { PyModuleDef_HEAD_INIT, "sum_from_cpython", "A Python module that prints 'sum_from_cpython world' from C code.", -1, sum_from_cpython_methods }; // Module initialization // Python calls this function when importing your extension. It is important // that this function is named PyInit_[[your_module_name]] exactly, and matches // the name keyword argument in setup.py's setup() call. PyMODINIT_FUNC PyInit_sum_from_cpython(void) { Py_Initialize(); return PyModule_Create(&sum_from_cpython_definition); }
# PRESENTING CODE
Defining a method in the module
// Module method definitions static PyObject* my_sum_from_cpython(PyObject *self, PyObject *args) { PyObject *pList; PyObject *pItem; Py_ssize_t n; int i; if (!PyArg_ParseTuple(args, "O!", &PyList_Type, &pList)) { PyErr_SetString(PyExc_TypeError, "parameter must be a list"); return NULL; } n = PyList_Size(pList); long sum = 0; for (i = 0; i < n; i++) { pItem = PyList_GetItem(pList, i); if (!PyLong_Check(pItem)) { return NULL; } sum += PyLong_AsLong(pItem); } return Py_BuildValue("l", sum); } // Method definition object for this extension, these argumens mean: // ml_name: The name of the method // ml_meth: Function pointer to the method implementation // ml_flags: Flags indicating special features of this method, such as // accepting arguments, accepting keyword arguments, being a // class method, or being a static method of a class. // ml_doc: Contents of this method's docstring static PyMethodDef sum_from_cpython_methods[] = { { "my_sum", my_sum_from_cpython, METH_VARARGS, "" }, {NULL, NULL, 0, NULL} };
# PRESENTING CODE
Seems Fine? But reality is...
Cons:
- Manual ref counting
- Manual exception handling
- A lot of boilerplate to define functions and modules
=> a looooooot of code and manual work!

SWIG: takes in C/C++ header and generates wrapper code
(generates an extension module for Python)
from swig_example import my_add my_add(1, 2) # note that I tried to implement my_sum which has # a list[int] arg, but that seems pretty compilcated. # https://www.swig.org/Doc3.0/Python.html#Python_nn59
from swig_example import my_add my_add(1, 2) # note that I tried to implement my_sum which has # a list[int] arg, but that seems pretty compilcated. # https://www.swig.org/Doc3.0/Python.html#Python_nn59
/* swig_example.i */ %module swig_example %{ /* Put header files here or function declarations like below */ extern int my_add(int a, int b); %} extern int my_add(int a, int b);
subprocess.run(["swig", "-outcurrentdir", "-python", "src/swig_example.i"]) swig_mod = Pybind11Extension('_swig_example', sources = ['swig_sum_wrap.c', 'src/sum.c']) setup (name = 'demo', version = '0.0', description = 'This is a demo package', ext_modules = [swig_mod], py_modules = ['swig_example'] )
Pybind11:
header-only, supports C++11/14/17 features & STL data types, great portability
#include <pybind11/pybind11.h> #include <pybind11/stl.h> namespace py = pybind11; int add(int a, int b) { return a + b; } PYBIND11_MODULE(pybind11_demo, m) { m.def("add", &add, "Binary add"); }
pybind_demo_mod = Pybind11Extension( 'pybind11_demo', sources = ['src/pybind11_demo.cpp']) setup(name = 'demo', version = '0.0', description = 'This is a demo package', ext_modules = [pybind_demo_mod], )

struct Response { int status; std::string reason; std::string text; Response (int status, std::string reason, std::string text = " "): status(status), reason(std::move(reason)), text(std::move(text)) {} Response(): Response(200, "OK") {} }; PYBIND11_MODULE(pybind11_demo, m) { py::class_<Response>(m, "Response") .def(py::init<>()) .def(py::init<int, std::string>()) .def(py::init<int, std::string, std::string>()) .def_readonly("status", &Response::status) .def_readonly("reason", &Response::reason) .def_readonly("text", &Response::text); }
Pybind11: Class
a = Response() print(a.status) # prints "200" # a.status = 100 # The line above will error out as expected # Traceback (most recent call last): # File "demo_pybind11.py", line 7, in <module> # a.status = 100 # AttributeError: can't set attribute
PYBIND11_MODULE(pybind11_demo, m) { m.def("count_strings", [](py::list list) { int n = 0; for (auto item : list) { if (py::isinstance<py::str>(item)) { n++; } } return n; }); m.def("tp", [](int x) {return "int";}); m.def("tp", [](float x) {return "float";}); }
Pybind11: python obj as input, overloading
print('count_strings: ', count_strings(["hello", "world"])) print('overloading tp function:', tp(2), tp(2.1)) tp('hello') # ERROR!
Pybind11: Type conversions
-
Use a native C++ type everywhere. In this case, the type must be wrapped using pybind11-generated bindings so that Python can interact with it.
-
Use a native Python type everywhere. It will need to be wrapped so that C++ functions can interact with it.
-
Use a native Python type everywhere. It will need to be wrapped so that C++ functions can interact with it. (⚠️ ALWAYS REQUIRES A COPY)
py::class_<Foo>(m, "Foo"); m.def("f1", [](const Foo& foo) {...})
m.def("f2", [](py::list list) {...})
m.def("f3", [](int x) {...}) m.def("f4", [](const std::string& x) {...}) m.def("f5", [](const std::vector<int>& x) {...})
Pybind11: and more...
- Functions accepting and returning custom data structures per value, reference, or pointer
- Instance methods and static methods
- Overloaded functions
- Instance attributes and static attributes
- Arbitrary exception types
- Enumerations
- Callbacks
- Iterators and ranges
- Custom operators
- Single and multiple inheritance
- STL data structures
- Smart pointers with reference counting like
std::shared_ptr
- Internal references with correct reference counting
- C++ classes with virtual (and pure virtual) methods can be extended in Python
- ...
Solution | Brief Description | Scenarios |
---|---|---|
Python C API | 需要写C,复杂但性能最优 | 极致追求性能 |
CFFI/ctypes | 需要在python里写/生成boilerplate code | 调用已有的动态库,且不涉及复杂数据结构,就调用几个function |
SWIG | 自动生成ext module 的binding code,要写interface file 但是支持有限不好写 | 需要支持各种语言的binding,只需写一份interface文件 |
pybind11/nanobind/boost.python | 好写,支持C++ feature,但是性能有牺牲 | 有复杂的数据结构,Python和C/C++交互较多 |
Summary
The bridge is bi-directional,
but calling CPython in C/C++...

- Extension modules (CPython)
- Python C API
- Python-aware compilation framework that generates machine code from Python source
- Numba (JIT to compiled C code)
- Cython (a static compiler for both Python lang and an extended superset of Python lang)
- TorchScript
- Taichi
Speedup your Python programs
For perf of extension modules, let's first introduce
THE GIL
GIL: why?
Here're some context:
- Python predates the burst of multi-core processors era which roughly started since 2004
- Python chose to use reference counting to do memory management and then introduced GIL for multi-threaded execution.
- Let's take another step back: what's memory management?
Memory Management
Basic terminology
- Garbage: allocated memory that is no longer necessary.
- Garbage recovery should neither recover space too early [soundness] nor too late [completeness]
Memory Management techniques
Good luck, you're on your own!
1. Fully manual
malloc/free in C, new/delete in C++
2. Semi-automated
Reference count:
Every object contains a counter which is incremented when a reference of object is stored somewhere, and deleted when a reference to the object is deleted.
When the counter reaches zero, the object is freed.
You don't need to free manually, but ref inc/dec is still your work!
3. Fully automated:
Language's runtime does it for you.
Garbage Collection:
Tracing which objects are reachable from certain root objects
Java
Ref count
More automatic?
C++11 provides reference counted smart pointers, via std::shared_ptr class, enabling automatic shared memory-management of dynamically allocated objects.
CPython: there's no truly portable automatic GC in C so it headed down with a refined ref count solution.
- Automated insertion of reference increment and decrement
- Cycle detector
⚠️ If you're using Python C-API, you have to do reference increment and decrement yourself.
import sys a = [1] b = a print(sys.getrefcount(a)) # prints 3.
Ref count easily get messed up in multi-threaded environment
💥 Race condition: two threads try to increment/decrement the counter of one object simultaneously.
✌️Add lock(s)!
💩 If you add a lock per object, you'll get into another deadlock problem
✅ Python introduced the infamous GIL (global interpreter lock)! In other words, you have to hold the GIL to run some Python bytecode!
❓ Why not atomic inc/dec?
I'd welcome a set of patches into Py3k only if the performance for a single-threaded program (and for a multi-threaded but I/O-bounded program) does not decrease
from It isn't easy to remove the GIL by Guido

Here's an easy way to speed up your python program:
You can RELEASE the GIL when calling into your extension modules!
👻 as long as you do it right!
m.def("long_running", []() { sleep(10); std::cout << "Done" << std::endl; }, py::call_guard<py::gil_scoped_release>());
pool = ThreadPoolExecutor() for i in range(3): print(f'before launching {i}th long_runninng thread', time.time()) pool.submit(long_running) print(f'launched {i}th long_runninng thread', time.time()) # before launching 0th long_runninng thread 1666534226.04406 # launched 0th long_runninng thread 1666534226.044158 # before launching 1th long_runninng thread 1666534226.044163 # launched 1th long_runninng thread 1666534226.0442119 # before launching 2th long_runninng thread 1666534226.044217 # launched 2th long_runninng thread 1666534226.044263 # Done # Done # Done



TLDR;
✅ 最重要是知道性能的bottleneck在哪
❌ 在Python和C++左右横跳
❓ 也许我们的pybind11 code 有low hanging fruit?
Acknowledgements & readings
- Python For High Performance
- The supreme guide of to understand the working of CPython
- Python's infamous GIL by Larry Hastings
- Understanding the Python GIL by David Beazley
- Official Python doc for extending
- The Zen of Python
- Python Regrets by Guido
- Python2 -> Python3 Migration Q &A
- pybind11 - seamless operability between C++11 and Python by Ivan Smirnov
- Memory Management
- Python GIL from realpython
Many resource in these slides are copied from the references below:
Thank you!
Questions?
The Bridges between Python and C/C++
By ailzhang
The Bridges between Python and C/C++
- 196