Y is X but X is not always Y:
An introduction
to Python internals

Kevin Hernández

Hey I'm

THE

AGENDA

  • Python is magical 💫
  • The amazing world of Python implementations 🐍
  • Code structure 🚀
  • What happens when you hit enter ⚡️
  • CPython objects 🔥
  • Know your dictionary 📖
  • Why even if X is Y, Y could be different than X 🤯
  • Python philosophy 🖋
  • Contribute 🆘
  • Thanks for coming to my Pycon Talk 🖖
 

Python

💫 is magical 💫

The amazing world of

Python implementations

CPython

Pypy

MicroPython

Jython

Code structure

 

cpython/
├── Doc             ← Source for the documentation
├── Grammar  ← Computer-readable language definition
├── Include      ← C header files
├── Lib              ← Standard library modules written in Python
├── Mac            ← macOS support files
├── Misc           ← Miscellaneous files
├── Modules    ← Standard Library Modules written in C
├── Objects      ← Core types and the object model
├── Parser        ← The Python parser source code
├── PC               ← Windows build support files
├── PCbuild     ← Windows build support files for older Windows versions
├── Programs  ← Source code for the python executable and other binaries
├── Python       ← The CPython interpreter source code
└── Tools          ← Standalone tools useful for building or extending Python
# For normal and annotated assignments, additional restrictions enforced by the interpreter
del_stmt: 'del' exprlist
pass_stmt: 'pass'
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
break_stmt: 'break'
continue_stmt: 'continue'
return_stmt: 'return' [testlist_star_expr]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test]]
import_stmt: import_name | import_from
import_name: 'import' dotted_as_names
# note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS
import_from: ('from' (('.' | '...')* dotted_name | ('.' | '...')+)
              'import' ('*' | '(' import_as_names ')' | import_as_names))
import_as_name: NAME ['as' NAME]
dotted_as_name: dotted_name ['as' NAME]
import_as_names: import_as_name (',' import_as_name)* [',']
dotted_as_names: dotted_as_name (',' dotted_as_name)*
dotted_name: NAME ('.' NAME)*
global_stmt: 'global' NAME (',' NAME)*
nonlocal_stmt: 'nonlocal' NAME (',' NAME)*
assert_stmt: 'assert' test [',' test]

What happens when

You hit enter

source

code

bytecode

running

code

compiles

interpreted

🐍

CPython objects

PyObject *
PyObject_Repr(PyObject *v)
{
    PyObject *res;
    if (PyErr_CheckSignals())
        return NULL;
    ...
    if (v == NULL)
        return PyUnicode_FromString("<NULL>");
    if (Py_TYPE(v)->tp_repr == NULL)
        return PyUnicode_FromFormat("<%s object at %p>",
                                    v->ob_type->tp_name, v);
    ...
class Custom: 
    def __init__(self, id):
        self.id = id

	def __repr__(self):
		return f"Object({self.id})"

Know your

Dictionary

Keys

Indexes

Dict Value

CPython

Jython

PyPy

hash(key) & mask

7

3

5

<hash|key|value>

<hash|CPython|C>

<hash|Jython|java>

<hash|Pypy|Python>

#ifndef Py_DICT_COMMON_H
#define Py_DICT_COMMON_H

typedef struct {
    /* Cached hash code of me_key. */
    Py_hash_t me_hash;
    PyObject *me_key;
    PyObject *me_value; /* This field is only meaningful for combined tables */
} PyDictKeyEntry;

...
...
/* PyDict_MINSIZE is the starting size for any new dict.
 * 8 allows dicts with no more than 5 active entries; experiments suggested
 * this suffices for the majority of dicts (consisting mostly of usually-small
 * dicts created to pass keyword arguments).
 * Making this 8, rather than 4 reduces the number of resizes for most
 * dictionaries, without any significant extra memory use.
 */
#define PyDict_MINSIZE 8
...
...
for (Py_ssize_t i = 0; i < numentries; i++) {
            assert(oldvalues[i] != NULL);
            PyDictKeyEntry *ep = &oldentries[i];
            PyObject *key = ep->me_key;
            Py_INCREF(key);
            newentries[i].me_key = key;
            newentries[i].me_hash = ep->me_hash;
            newentries[i].me_value = oldvalues[i];
        }
...
...
static Py_ssize_t _Py_HOT_FUNCTION
lookdict(PyDictObject *mp, PyObject *key,
         Py_hash_t hash, PyObject **value_addr)
{
    size_t i, mask, perturb;
    PyDictKeysObject *dk;
    PyDictKeyEntry *ep0;

top:
    dk = mp->ma_keys;
    ep0 = DK_ENTRIES(dk);
    mask = DK_MASK(dk);
    perturb = hash;
    i = (size_t)hash & mask;

     for (;;) {
        Py_ssize_t ix = dictkeys_get_index(dk, i);
        if (ix == DKIX_EMPTY) {
            *value_addr = NULL;
            return ix;
        }
        if (ix >= 0) {
            PyDictKeyEntry *ep = &ep0[ix];
            assert(ep->me_key != NULL);
            if (ep->me_key == key) {
                *value_addr = ep->me_value;
                return ix;
            }
            if (ep->me_hash == hash) {
                PyObject *startkey = ep->me_key;
                Py_INCREF(startkey);
                int cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
                Py_DECREF(startkey);
                if (cmp < 0) {
                    *value_addr = NULL;
                    return DKIX_ERROR;
                }
                if (dk == mp->ma_keys && ep->me_key == startkey) {
                    if (cmp > 0) {
                        *value_addr = ep->me_value;
                        return ix;
                    }
                }
                else {
                    /* The dict was mutated, restart */
                    goto top;
                }
            }
        }
        perturb >>= PERTURB_SHIFT;
        i = (i*5 + perturb + 1) & mask;
    }
    Py_UNREACHABLE();
}
...

Why even if X is Y

Y could be different than X

>>> X = 256
>>> Y = 256
>>> X is Y
True
>>> X1 = 257
>>> Y1 = 257
>>> X1 is Y1
False
#define _PY_NSMALLPOSINTS           257
#define _PY_NSMALLNEGINTS           5
...
...
...
#if _PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS > 0
    /* Small integers are preallocated in this array so that they
       can be shared.
       The integers that are preallocated are those in the range
       -_PY_NSMALLNEGINTS (inclusive) to _PY_NSMALLPOSINTS (not inclusive).
    */
    PyLongObject* small_ints[_PY_NSMALLNEGINTS + _PY_NSMALLPOSINTS];
#endif
};

Singletons!

>>> id(X)
140659347485600
>>> id(Y)
140659347485600

>>> id(X1)
140659338270544
>>> id(Y1)
140659338270352
>>> x1 == y1
True

== compares values

is compares objects

>>> X1 = 257
>>> Y1 = 257
>>> X1 is Y1
True

After updating CPython's code

Python

philosophy

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren't special enough to break the rules. Although practicality beats purity.
index = 0
for object in objects:
	index += 1
	print(f'Position: {index}, Object: {object}')

Not Pythonic

for index, object in enumerate(objects):
    print(f'Position: {index}, Object: {object}')

 Pythonic

Contributing to

Python

THANKS

Y is X but X is not always Y: An introduction to Python internals

By Kevin Hernández

Y is X but X is not always Y: An introduction to Python internals

PyconCo 2020 presentation

  • 6,909