Advanced Python (sort of)

Brief

Learn English first

Who am I

  • Andrii Ursulenko
  • not working at Evo
  • but worked here through June 2013 till October 2016
  • been team lead, worked on prom family projects, kabanchik, goodini and couple more
  • this is my third time summer python lab participation
  • have 3 years of Java, 4 years of python and 2 years of go
  • having fun with clojure and erlang
  • github.com/a-urth
  • a-urth.github.io
  • a.ursulenko@gmail.com
  • skype: a.ursulenko (but I dare You not)

Why this is best (IMO) place to start

  • Its hard to get here
  • There are tons of cool and useful stuff to learn
  • Definitely You'll learn something new

What we will talk about

  • Python's object model
  • Classes
  • Metaclasses
  • Descriptors

Disclaimer #1

CPython, Python3

Disclaimer #2

  • I'm not the highest instance  and not an expert

  • You may know everything already

  • Where are a lot of things in here

  • You may remember only 10% of that

  • It may be overwhelming, but bear with me

  • All of this is just my point of view

Goal from the capital G

Inspire You and clarify some details

And yeah, I'm struggling of jet lag...

Python's object/data model

https://docs.python.org/3/reference/datamodel.html

Data types in python?

None, NotImplemented, Ellipsis, int, bool, float, complex, string, tuple, byte, list, byte array, set, frozen set, dict, user function, instance method, generator function, coroutine function, async generator function, built-in function, built-in method, class, class instance, module, file object, code object, frame object, traceback object, slice object, static method, class method, 

What does it mean?

  • Dynamic typing
  • No primitive types
  • Every object is PyObject

PyObject

typedef struct _object {
    PyObject_HEAD
} PyObject;
struct _object *_ob_next;
struct _object *_ob_prev;
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;

id, is

# id (built-in function)
# returns object's address in memory

In [1]: id(1)
Out[1]: 4452383728

In [2]: id('')
Out[2]: 4453206704

In [3]: id(None)
Out[3]: 4452107176

In [4]: id([])
Out[4]: 4477498568

In [5]: id(...)
Out[5]: 4452118880
# is (statement) compares objects
# by their identity i.e. by their
# address in memory

In [6]: 1 is 1
Out[6]: True

In [7]: -6 is -6
Out[7]: False

In [9]: 'asd' is 'asd'
Out[9]: True

In [10]: a, b = 'asd!', 'asd!'

In [11]: a is b
Out[11]: True

In [12]: a = 'asd!'

In [13]: b = 'asd!'

In [14]: a is b
Out[14]: False

int

  • int is array of uint_32, float - array of C double
  • static list for integers at startup to reduce MALLOC
  • interns small integers from -5 to 256 inclusive
In [28]: -6 is -6
Out[28]: False

In [29]: 257 is 257
Out[29]: True

In [36]: a = 257

In [37]: b = 257

In [38]: a is b
Out[38]: False

str

  • immutability matters
  • string interning FTW
  • empty and one letter strings interned
  • strings till 20 alpha characters interned
  • interning is done again, during compilation
  • strings compared and interned by hash
In [1]: 'a' * 20 is 'aaaaaaaaaaaaaaaaaaaa'
Out[1]: True

In [2]: 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
Out[2]: False

In [11]: 'a' + 'sd' is 'asd'
Out[11]: True

In [12]: ''.join(['a', 's', 'd']) is 'asd'
Out[12]: False
In [3]: 'asd!' is 'asd!'
Out[3]: True

In [6]: a, b = 'asd!', 'asd!'

In [7]: a is b
Out[7]: True

In [8]: a = 'asd!'

In [9]: b = 'asd!'

In [10]: a is b
Out[10]: False

list, tuple, dict

  • lists are dynamically extendable and shrinkable
  • tuples preallocated same as ints (sort of) - for each tuple size (from 0 to 20) up to 2000 tuples preallocated
  • dicts are hash tables...

bool, None, ellipsis, NotImplemented

  • bools are ints, literally
  • True, False, None, NotImplemented and ellipsis - singletons
  • and thats why you should ALWAYS compare with any of them by means of is
  • NotImplemented is not NotImplementedError and its not an exception, its special value to be returned
  • ellipsis is pretty useful, but don't abuse it
In [1]: True + True == 2 - False
Out[1]: True

In [2]: a = True

In [3]: a is True
Out[3]: True

In [4]: a = ...

In [5]: a is ...
Out[5]: True

In [6]: a = None

In [7]: a is None
Out[7]: True
# ellipsis is useful when You need default
# value but None is also an option

default = ...


def update_user(name, email, address=default):
    user = db.get_user(email)
    if address is not default:
        user.address = address

    user.name = name
    user.email = email

    db.update_user(user)

P.S.

If You would ever consider track object's uniqueness by their id without storing links on them

ids = set()  # just a set for some object's identifiers


def f():
    l2 = [1, 2, 3]  # create a list

    ids.add(id(l2))  # store its address in global registry


f()  # call function

l3 = ['1', '2', '3', '4']  # create new list with other data


print(id(l3) in ids)

Writing code with a class

Class, OOP?

Why?

  • decoupling
  • logic extraction
  • redundancy reducing
  • abstraction

What?

  • encapsulation
  • inheritance
  • polymorphism
  • abstraction(?)

Old style vs new style

  • python2 relevant only
  • not inherited from object
  • MRO depth first
  • not have a lot of things
  • somewhat deprecated

Old

New

  • subclassed from object (explicitly for py2, implicitly for py3
  • MRO breadth first
  • have all magic methods (basic ones)
  • somewhat faster
  • somewhat bigger
class Person:  # always uppercased camel-case (pep-8)

    name = 'Person'

    def __init__(self, name):
        self._name = name
        self.__secret_name = 'secret-{}'.format(name)

    def get_name(self, *args, **kwargs):
        print('I am {}'.format(self))

    @classmethod
    def get_class_name(cls, *args, **kwargs):
        print('I am {}'.format(cls))

    @staticmethod
    def get_static_name():
        print('I have no name actually =(')


person = Person('Bob')
person.get_name()

Person.get_class_name()

Person.get_static_name()

print(person.name, person._name)
print(person.__secret_name)
I am <__main__.Person object at 0x10bbb8cf8>

I am <class '__main__.Person'>

I have no name actually =(

Person Bob

Traceback (most recent call last):
  File "t.py", line 29, in <module>
    print(person.__secret_name)
AttributeError: 'Person' object has no attribute '__secret_name'
  • it is still subclassed from object
  • __init__ is a magic method and there are A LOT of them
  • __secret_name is sort of private field
  • self is just a name
  • first argument to instance methods is instance itself
  • @classmethod decorator turns first argument to class objects
  • @staticmethod turns instance method to sort of function inside a class - no additional arguments

Constructor, destructor?

Java and C++ people where you at?

__init__

  • constructor
  • creates and returns new class instance usually by means of object.__new__
  • called first
  • sort of destructor
  • returns nothing
  • called after all links to objects lost
class Object:

    def __new__(cls, *args, **kwargs):
        print(cls)
        return super().__new__(cls)

    def __init__(self, *args, **kwargs):
        print(self)

    def __del__(self):
        print('{} is down!'.format(self))


b = Object()

a = Object()
del a

Object()

__new__

__init__

__del__

  • just a place for initialization
  • returns nothing
  • called after __new__
<class '__main__.Object'>
<__main__.Object object at 0x10c6c9c18>

<class '__main__.Object'>
<__main__.Object object at 0x10c6c9c50>
<__main__.Object object at 0x10c6c9c50> is down!

<class '__main__.Object'>
<__main__.Object object at 0x10c6c9c50>
<__main__.Object object at 0x10c6c9c50> is down!

<__main__.Object object at 0x10c6c9c18> is down!
  • __new__  - control over instances creation, e.g. cache, singleton, etc.
  • __init__ - control over instances initialization, e.g. fields instantiation, resource opening, validation, etc.
  • __del__ - resource freeing, warnings, etc. CAUTION, self may not exists when it will be called.

Used for

del smth != smth.__del__

Quick mention on pythons GC

  • each object holds number of references to it
  • when it reaches 0 memory is freed
  • additional periodical cyclic reference resolve
  • generations concept

and weakref

  • reference to an object
  • but discards itself when all strong refs are gone
  • use - when You need to track objects but to not intrude into their lifecycle
from weakref import WeakSet


class A:
    pass


references = set()
weak_references = WeakSet()

a1, a2 = A(), A()

references.add(a1)
weak_references.add(a2)


print(references)
print(weak_references.data)

del a1
del a2

print(references)
print(weak_references.data)
{<__main__.A object at 0x10b532b38>}
{<weakref at 0x10b537b88; to 'A' at 0x10b532ba8>}

{<__main__.A object at 0x10b532b38>}
set()

super

  • built in function to get base/super class
  • since python3 works without arguments
  • but still may take them if You need
  • its not that simple in terms how it works
  • because You know, multiple inheritance
  • and totally relies on MRO (later on that)
  • and its not a function, its a class and returns proxy object
class A:

    def foo(self):
        print('from A')


class B(A):

    def foo(self):
        print('from B')
        super().foo()


class C(A):

    def foo(self):
        print('from C')
        super().foo()


class D(B, C):

    def foo(self):
        print('from D')
        super().foo()


D().foo()
from D

from B

from C

from A

MRO

  • stands for method resolution order
  • algorithm is called C3
  • generally breadth first tree traversal
  • accessed from class by .mro() method
  • super without arguments uses first class in called class mro list
  • super with arguments uses second argument (which is current self ) mro and first argument (which is class) for resolving current step of mro, then does same
D.mro()

[
    <class '__main__.D'>,
    <class '__main__.B'>,
    <class '__main__.C'>,
    <class '__main__.A'>,
    <class 'object'>
]
class A:
    def foo(self):
        print('from A')


class B(A):
    def foo(self):
        print('from B')
        super().foo()
        # eq to super(B, self).foo()


class C(A):
    def foo(self):
        print('from C')
        super().foo()
        # eq to super(C, self).foo()


class D(B, C):
    def foo(self):
        print('from D')
        super().foo()
        # eq to super(D, self).foo()


D().foo()

(1) So its D class and super will give B

(2) So its B class and super will give C

(3) So its C class and super will give A

(4) So its A class

So, can You redefine mro?

class A:
    def foo(self):
        print('from A')


class B(A):
    def foo(self):
        print('from B')
        super().foo()


class C(A):
    def foo(self):
        print('from C')
        super().foo()


class D(B, C):

    @classmethod
    def mro(cls):
        return [D, B, A]

    def foo(self):
        print('from D')
        super().foo()


print(D.mro())

D().foo()

Yes, but not the way You'd expect it to be

And I dare You don't

[
    <class '__main__.D'>,
    <class '__main__.B'>,
    <class '__main__.A'>
]

from D
from B
from C
from A
class meta(type):

    @classmethod
    def mro(cls):
        return [B, A, object]


class A:
    def foo(self):
        print('from A')


class B(A):
    def foo(self):
        print('from B')
        super().foo()


class C(A):
    def foo(self):
        print('from C')
        super().foo()


class D(B, C, metaclass=meta):

    def foo(self):
        print('from D')
        super().foo()


D().foo()
from B
from A

Since mro for D class doesn't contain it, its namespace doesn't even looked up

 

So it just goes for B and A classes

If You won't include object You won't be able to create objects at all

__dict__

  • Every object has __dict__ field
  • It contains all attributes, which belongs to this object
  • Its a regular dict, so its mutable
  • So You can add attributes dynamically
(1) True
(2)
{
    'foo': <function A.foo at 0x10dd82158>,
    'a': None,
    '__dict__': <attribute '__dict__' of 'A' objects>,
    '__weakref__': <attribute '__weakref__' of 'A' objects>,
    '__module__': '__main__',
    '__init__': <function A.__init__ at 0x10dd820d0>,
    '__doc__': 'This class literally does nothing'
}

(3) {'name': 'Bill'}
(4) {'surname': 'Billinsons', 'name': 'Bill'}
(5) J
class A:
    '''This class literally does nothing'''

    a = None

    def __init__(self, name, surname=None):
        self.name = name

        if surname is not None:
            self.surname = surname

    def foo(self):
        print('My name is {}'.format(self.name))


print('__dict__' in dir(A))  # 1

print(A.__dict__)  # 2

print(A('Bill').__dict__)  # 3

print(A('Bill', 'Billinsons').__dict__)  # 4

john = A('John')
john.__dict__['middlename'] = 'J'
print(john.middlename)  # 5

The problem

  • dict is hash table which starts with fixed size (?) and double/quadruples its size on reaching limit
  • so, even if You have class with one, two or no attributes dict will be created with preallocated memory
  • a lot of classes -> a lot of dicts -> a lot of arrays -> a lot of memory
  • dict access may be slow
  • mutability not always good

What to do?

__slots__ to the rescue!

  • attributes are fixed
  • attributes are stored in array of required length
  • somewhat faster access
  • less memory (it depends but in general cases 40-50% less)
class A:

    __slots__ = ('a', 'name')

    def __init__(self, name):
        self.a = 'A!'
        self.name = name

    def foo(self):
        print('My name is {}'.format(self.name))


print('__dict__' in dir(A))  # 1

print(A.__dict__)  # 2

bill = A('Bill')

print(bill.a, bill.name)  # 3

bill.foo()  # 4

print(bill.__dict__)  # 5
(1) False
(2)
{
    'name': <member 'name' of 'A' objects>,
    'foo': <function A.foo at 0x1065fd158>,
    'a': <member 'a' of 'A' objects>,
    '__module__': '__main__',
    '__doc__': None,
    '__slots__': ('a', 'name'),
    '__init__': <function A.__init__ at 0x1065fd0d0>
}

(3) A! Bill

(4) My name is Bill
(5)
Traceback (most recent call last):
  File "t.py", line 21, in <module>
    print(bill.__dict__)
AttributeError: 'A' object has no attribute '__dict__'

But what about interfaces?

In short - there are no such things.

Protocols are, but its not more than a convention

class ContextManager:

    def __enter__(self):
        pass

    def __exit__(self, *exc_info):
        pass
class Iterable:

    def __iter__(self):
        return iterator


class Iterator:

    def __next__(self):
        return next_value

Abstract classes

  • Classes with at least one abstract method
  • Describe base usage, intention for further implementation
  • Objects of which don't meant to be created
  • Only purpose is to be inherited from
from abc import ABCMeta, abstractmethod


class BaseMerger(metaclass=ABCMeta):

    @abstractmethod
    def get(self, _id):
        pass

    @abstractmethod
    def create(self, data):
        pass

    @abstractmethod
    def update(self, _id, data):
        pass

    def merge(self, _id, new_data):
        old_obj = self.get(_id)
        if old_obj is None:
            return self.create(new_data)

        return self.update(_id, new_data)

Wrong

class BaseMerger:

    def get(self, _id):
        raise NotImplementedError()

    def create(self, data):
        raise NotImplementedError()

    def update(self, _id, data):
        raise NotImplementedError()

    def merge(self, _id, new_data):
        old_obj = self.get(_id)
        if old_obj is None:
            return self.create(new_data)

        return self.update(_id, new_data)

Correct

  • somewhat works
  • objects still can be created
  • implementation for abstract method is not restricted
  • works
  • everything is controlled by meta

And one more thing, actually more than one

  • Do not use mutable objects as class attributes
  • Treat class attributes as constants or default values
  • Do not update class attributes
  • Do not mix class and instance members
  • functions > staticmethods
  • Do not use type(smth) == Smth, use isinstance(smth, Smth)
class Storage:

    __storage = []

    def store(self, val):
        self.__storage.append(val)

    def __repr__(self):
        return str(self.__storage)


class FutureStorage(Storage):
    pass


s = Storage()
s.store(1)
s.store(2)
print(s)  # 1

ss = Storage()
print(ss)  # 2

fs = FutureStorage()
print(fs)  # 3
(1) [1, 2]

(2) [1, 2]

(3) [1, 2]

Wrong

Correct

class Storage:

    __storage = None

    def __init__(self):
        self.__storage = []

    def store(self, val):
        self.__storage.append(val)

    def __repr__(self):
        return str(self.__storage)


class FutureStorage(Storage):
    pass


s = Storage()
s.store(1)
s.store(2)
print(s)

ss = Storage()
print(ss)

fs = FutureStorage()
print(fs)
(1) [1, 2]

(2) []

(3) []

Metaclasses

  • Have nothing to do with meta-programming
  • Mainly used to create classes same as classes for objects
  • Control over classes in general
  • Sometimes contain logic shared between classes
  • Also may be used to control object creation
  • Used rarely
  • Callable - may be classes but may be functions
  • If You don't know what metaclasses are 99% You don't actually need them

Where is one metaclass which all of You've already used

type()

type(class_name, base_classes, fields)

  • creates class objects, not only returns them
  • type != class statement
  • main metaclass by means of which object is created
  • instance of object class
  • think on that
Person = type('Person', (), {'name': '', 'surname': ''})

print(type(Person))  # 1
print(Person)  # 2

# p = Person('Patrick', 'Oneil')
# p = Person(name='Patrick', surname='Oneil')
p = Person()
p.name, p.surname = 'Patrick', 'Oneil'

print(type(p))  # 3
print(p)  # 4
print(p.name, p.surname)  # 5
(1) <class 'type'>
(2) <class '__main__.Person'>
(3) <class '__main__.Person'>
(4) <__main__.Person object at 0x10a373be0>
(5) Patrick Oneil
class Versioned(type):

    def __init__(cls, name, bases, params, **kwds):
        super().__init__(name, bases, params)

    def __prepare__(name, bases, version=None, **kwds):
        namespace = {}
        if version is not None:
            namespace['version'] = version

        return namespace

    def __new__(cls, cls_name, bases, params, **kwds):
        print('Creating class object for {}'.format(cls_name))
        return super().__new__(cls, cls_name, bases, params)


class Person(metaclass=Versioned, version=1):
    pass


print(Person().version)
Creating class object for Person
1
  • __prepare__ creates namespace before class creation, returns dict-like object
  • all keywords passed to class statement are passed through class creation process (metaclass methods)

To create class

  • subclass from type or and/or call it directly
  • redefine __new__ and/or __init__

To create instance

  • subclass from type or and/or call it directly
  • redefine __call__
class Singletoned(type):

    @classmethod
    def __prepare__(cls, name, bases, **kwds):
        return {'_obj': None}

    def __call__(cls, *args, **kwargs):
        if cls._obj is None:
            print('Creating new object for {}'.format(cls))
            cls._obj = super().__call__(*args, **kwargs)

        return cls._obj


class A(metaclass=Singletoned):
    pass


class B(metaclass=Singletoned):
    pass


a = A()
b = A()

print(a)  # 1
print(b)  # 2

a = B()
b = B()

print(a)  # 3
print(b)  # 4
Creating new object for <class '__main__.A'>
(1) <__main__.A object at 0x10c9b0c88>
(2) <__main__.A object at 0x10c9b0c88>

Creating new object for <class '__main__.B'>
(3) <__main__.B object at 0x10c9b0cc0>
(4) <__main__.B object at 0x10c9b0cc0>

Attributes lookup

  • everything from instance is looked up in the class if doesn't found
  • everything from class is looked up in metaclass if doesn't found
  • except __new__ and __init__ - they are used for class creation, and if not found on class they are never taken from metaclass
  • this shit is hard

Instance attribute lookup

Class attribute lookup

Summary

  • ORMs
  • template engines
  • interviews
  • very few other places
  • they are rare in general
  • as previously said - most of all You won't use them directly
  • but You'd better know about them because highly likely that You'll face them

Metaclass in a nutshell

  • hard to learn
  • hard to pull off
  • very powerful
  • looks awesome
  • easy avoidable

Descriptors

You know, python's, not the file ones

Why?

  • Very handy control over class/instance fields
  • Property/binding behavior
  • "Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works and an appreciation for the elegance of its design." Python documentation

What?

Class implementing a protocol

How?

  • __get__
  • __set__
  • __delete__
class Person:

    name = ''
    age = 0

    def __check_name(self, name):
        if not isinstance(name, str):
            raise TypeError

        return name

    def __check_age(self, age):
        if not isinstance(age, int):
            raise TypeError

        return age

    def __init__(self, name, age):
        self.age = age
        self.name = name

    def __setattr__(self, name, value):
        if name == 'name':
            value = self.__check_name(value)
        elif name == 'age':
            value = self.__check_age(value)

        super().__setattr__(name, value)

    def __repr__(self):
        return 'Person: {} {}'.format(self.name, self.age)


p = Person('Bill', 29)
print(p)

p.name = 123
Person: Bill 29

Traceback (most recent call last):
  File "t.py", line 38, in <module>
    p.name = 123
  File "t.py", line 25, in __setattr__
    value = self.__check_name(value)
  File "t.py", line 8, in __check_name
    raise TypeError
TypeError

Problem

  • redundancy
  • not full control
  • looks ugly
class Typed:

    def __init__(self, _type, default=...):
        self.val = default if default is not ... else _type
        self.type = _type

    def __set__(self, obj, value):
        if not isinstance(value, self.type):
            raise TypeError

        self.val = value

    def __get__(self, obj, type=None):
        return self.val

    def __delete__(self, obj):
        raise AttributeError


class Person:

    name = Typed(str)
    age = Typed(int)

    def __init__(self, name, age):
        self.age = age
        self.name = name

    def __repr__(self):
        return 'Person: {} {}'.format(self.name, self.age)


p = Person('Bill', 29)
print(p)

p.name = 123
Person: Bill 29

Traceback (most recent call last):
  File "t.py", line 36, in <module>
    p.name = 123
  File "t.py", line 9, in __set__
    raise TypeError
TypeError

Important

  • Non-data descriptors have only __get__
  • Data descriptors - have both __get__ and __set__
  • Read only descriptors have __get__ and __set__ but raise exception on __set__
  • Descriptors work though __getattribute__
  • Data descriptors always override instance dictionaries.
class Descriptor:

    def __set__(self, obj, value):
        pass

    def __get__(self, obj, type=None):
        return 'Look at me!'


class A:

    a = Descriptor()

    def __init__(self):
        self.a = 'Hey ya!'


a = A()
print(a.a)
Look at me!
class Property(object):

    def __init__(self, fget=None, fset=None, fdel=None, doc=None):
        self.fget = fget
        self.fset = fset
        self.fdel = fdel
        if doc is None and fget is not None:
            doc = fget.__doc__
        self.__doc__ = doc

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        if self.fget is None:
            raise AttributeError("unreadable attribute")
        return self.fget(obj)

    def __set__(self, obj, value):
        if self.fset is None:
            raise AttributeError("can't set attribute")
        self.fset(obj, value)

    def __delete__(self, obj):
        if self.fdel is None:
            raise AttributeError("can't delete attribute")
        self.fdel(obj)
class Square:

    def __init__(self, width, height):
        self.width = width
        self.height = height

    def _area(self):
        return self.width * self.height

    area = Property(_area)

    del _area


box = Square(7, 15)

print(box.area)
box.area = 150
105

Traceback (most recent call last):
  File "t.py", line 56, in <module>
    box.area = 150
  File "t.py", line 21, in __set__
    raise AttributeError("can't set attribute")
AttributeError: can't set attribute
  • del _area just to clean namespace a little
  • use @property decorator instead

Usecases

  • A lot
  • Non data descriptors transform Your method calls from obj.f(*args) to f(obj, *args)
  • @property, @classmethod, @staticmethod
  • ORMs
  • Forms and other validators
  • etc, etc, etc

Advanced Python (sort of)

By Andrii Ursulenko

Advanced Python (sort of)

  • 1,135