Python's Data Structures: Basics and Advanced

PyRoma 2015

About Me

Full Name: Andrea Iuliano

Age: 25

Master Degree: Computer Science

At: UniRoma3

Passions:

Computer Vision
Videogames
Animals (reptiles lover :D)

Contacs:

github.com/Pausa90
andreaiuliano90@gmail.com

Computer Vision Engineer

"if I can see it, It can see it"

More data structures?

We already have a general purpose data structures which providing a containing functions:

list
tuple
set
frozenset
dict

List

The most basic data structure offered by Python.

Each element is stored into a sequence space and it can be accessed by an index (eg. an integer number).

list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5 ];
list3 = ["a", "b", "c", "d"];

List

Also they can be used as a stack structure, offering in an easily way LIFO accessing.

>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]

Tuple

An immutable collection of elements. You cannot update or change the value of tuple's elements or its size.

>>> tupl = (1, 2, 3, 4, 5, 6, 7 )
>>> tupl[0]
1
>>> tupl[0] = 10
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-947ebd095daf> in <module>()
----> 1 tupl[0] = 10
1
TypeError: 'tuple' object does not support item assignment

Set

It is an unordered collection with no duplicated elements. It supports mathematical operations like union, intersection and difference.

>>> set1 = set(range(0,5))
>>> set2 = set(range(3,8))
>>> set1
{0, 1, 2, 3, 4}
>>> set2
{3, 4, 5, 6, 7}

>>> set1.union(set2)
{0, 1, 2, 3, 4, 5, 6, 7}

>>> set1.intersection(set2)
{3, 4}

>>> set1.difference(set2)
{0, 1, 2}

Text

Frozenset

Same as set with an immutable feature

>>> frozen_set = frozenset(range(0,5))
>>> frozen_set
frozenset({0, 1, 2, 3, 4})

>>> frozen_set.
frozen_set.copy                  frozen_set.issubset
frozen_set.difference            frozen_set.issuperset
frozen_set.intersection          frozen_set.symmetric_difference
frozen_set.isdisjoint            frozen_set.union

Dict

A dictionary is an unordered set of (key : value) pairs, with a unique key value.

>>> dict = {'Name': 'Valerio', 'Age': 25, 'Name': 'Andrea'};
>>> dict
{'Age': 25, 'Name': 'Andrea'}
>>> dict['Name'];
'Andrea'
>>> dict.keys()
['Age', 'Name']
>>> dict.values()
[25, 'Andrea']

Advanced Collections

" So why we need it? We have a useful primitive data structures, that we can combine as we like it! "

The correct answer is:

code reuse
no bugs "guaranteed"
high-performance guaranteed

Collections

Introduced with Python2:

Counter
deque
defaultdict
namedtuple
OrderedDict

Introduced with Python3:

ChainMap
UserDict
UserList
UserString

Counter

It is a dict subclass which can easily count hashable objects.

>>> char_counter = Counter("Hi, it works even with strings")
>>> char_counter
Counter({' ': 5, 'i': 4, 's': 3, 't': 3, 'e': 2, 'n': 2, 'r': 2, 'w': 2, 
        'g': 1, 'H': 1, 'k': 1, ',': 1, 'o': 1, 'v': 1, 'h': 1})

>>> string_counter = Counter(["Hi", ",", "it", "works", "even", "with", 
        "strings"])
>>> string_counter
Counter({'even': 1, ',': 1, 'it': 1, 'Hi': 1, 'works': 1, 'with': 1, 
        'strings': 1})

>>> string_counter.elements()
<itertools.chain at 0x7fe9b37ebb10>

>>> list(string_counter.elements())
['even', ',', 'it', 'Hi', 'works', 'with', 'strings']

Counter

>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]


>>> # Some mathematicals examples
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d                       # intersection:  min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

Deque

A generalization of stacks and queues.

Advantages:

thread-safe
memory efficients appends and pops (both sides) with approximately O(1)
Can create infinite size deque or set a maxlen value. If it is setted and if the deque is full, the newest element pop out the oldest in the other side

Deque

>>> deq = deque([], 3)
>>> deq.append(1)
>>> deq
deque([1], maxlen=3)

>>> deq.append(2)
>>> deq.append(3)
>>> deq
deque([1, 2, 3], maxlen=3)

>>> deq.append(4)
>>> deq
deque([2, 3, 4], maxlen=3)

>>> deq.appendleft(5)
>>> deq
deque([5, 2, 3], maxlen=3)

Deque

>>> #Rotations 
>>> deq = deque([5, 2, 3], maxlen=3)

>>> deq.rotate(1)
>>> deq
deque([3, 5, 2], maxlen=3)

>>> deq.rotate(2)
>>> deq
deque([5, 2, 3], maxlen=3)

>>> deq.rotate(-3)
>>> deq
deque([5, 2, 3], maxlen=3)

>>> # Linux Tail Implementation
>>> def tail(filename, n):
...    'Return the last n lines of a file'
...     with open(filename) as f:
...        return deque(f, n)

Deque

>>> def moving_average(iterable, n=3):
...   # Get an iterator from iterable object
...   it = iter(iterable)
...
...   # Create a deque as a n-size window
...   deq = deque(itertools.islice(it, n-1))
...   deq.appendleft(0)
...   tot = sum(deq)
...
...   # Iterate the list and get the mean for each window
...   for elem in it:
...       popped = deq.popleft()
...       tot += elem - popped
...       deq.append(elem)
...       yield tot / float(n)

>>> raw_list = map(lambda x: x*5, range(0,10))
>>> print "input: ", raw_list
input:  [0, 5, 10, 15, 20, 25, 30, 35, 40, 45]

>>> average = moving_average(raw_list)
>>> print "output: ", list(average)
output:  [5.0, 10.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0]

Deque

Defaultdict

It is a dictionary like an object which provides all methods provided by dictionary with less code and the same speed.

>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
...     d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]

Defaultdict

>>> #Dict of random int
>>> rand_dict = defaultdict(lambda: randint(0,3))
>>> rand_dict
defaultdict(<function <lambda> at 0x7f9254f0b1b8>, {})

>>> for num in range(5):
...    rand_dict[num]

>>> rand_dict
defaultdict(<function <lambda> at 0x7f9254f0b1b8>, {0: 1, 1: 1, 2: 0, 3: 2, 4: 1})

Namedtuple

A factory of hybrid "tuple-dictionary" object: it returns a tuple charaterized by:

name
accessing by name and by indexing
does not have real keys -> no hashability issues
immutability values

Namedtuple

>>> Point = namedtuple('Point', ['x', 'y'])  # Defining the namedtuple
>>> p = Point(10, y=20)  # Creating an object
>>> p
Point(x=10, y=20)
>>> p.x + p.y
30
>>> p[0] + p[1]  # Accessing the values in normal way
30
>>> x, y = p     # Unpacking the tuple
>>> x
10
>>> y
20

Namedtuple: in depth

>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)
class Point(tuple):
    'Point(x, y)'

    __slots__ = ()

    _fields = ('x', 'y')

    def __new__(_cls, x, y):
        'Create a new instance of Point(x, y)'
        return _tuple.__new__(_cls, (x, y))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new Point object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 2:
            raise TypeError('Expected 2 arguments, got %d' % len(result))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return 'Point(x=%r, y=%r)' % self

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values'
        return OrderedDict(zip(self._fields, self))

    def _replace(_self, **kwds):
        'Return a new Point object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('x', 'y'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % kwds.keys())
        return result

    def __getnewargs__(self):
        'Return self as a plain tuple.   Used by copy and pickle.'
        return tuple(self)

    __dict__ = _property(_asdict)

    def __getstate__(self):
        'Exclude the OrderedDict from pickling'
        pass

    x = _property(_itemgetter(0), doc='Alias for field number 0')

    y = _property(_itemgetter(1), doc='Alias for field number 1')

OrderedDict

A regular dictionary which is able to remember the order that items were inserted

>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}

>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])

>>> # dictionary sorted by value
>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])

Collections ABC

Collections offers many Abstract Base Classes to entice the use of these data structures, even if you want to build your own collection.

class collections.Container
class collections.Hashable
class collections.Sized
class collections.Callable
class collections.Iterable
class collections.Iterator
class collections.Sequence
class collections.MutableSequence
class collections.Set
class collections.MutableSet
class collections.Mapping
class collections.MutableMapping
class collections.MappingView
class collections.ItemsView
class collections.KeysView
class collections.ValuesView

ChainMap

It provides a quickly linking between many mappings, so they can be treated as a single unit.

Features:

the mappings are stored in a public and updatable list (there are no other states)
writes, updates and deletions only operate on the first mapping
the underlying mapping is stored by references -> auto updating mapping

ChainMap

>>> dict1 = { "Name": "Andrea", "Surname": "Iuliano", "Age": 25 }
>>> dict2 = { "Name": "Valerio", "Surname": "Rossi", \
          "Hobbies": ["skate", "snowboard", "reptiles", "videogames"] }

>>> chain = ChainMap(dict1, dict2)
>>> chain
ChainMap({'Name': 'Andrea', 'Age': 25, 'Surname': 'Iuliano'}, \
        {'Name': 'Andrea', 'Surname': 'Iuliano', \
        'Hobbies': ['skate', 'snowboard', 'reptiles', 'videogames']})

>>> chain['Name']
'Andrea'
>>> chain['Hobbies']
['skate', 'snowboard', 'reptiles', 'videogames']

UserDict, UserList e UserString

These three classes are wrappers around dictionary/list/string objects and allow you to treat it as a corrisponding class.

To get these functionality it needs to:

create a subclass from UserDict/UserList/UserString
declare self.data as a correspondent empty type
invoke self.update(args) to update all the changes in self.data

UserDict

>> class User(UserDict):
...
...    def __init__(self, name, surname, age):
...    	  self.data = {}
...    	  self.name = name
...    	  self.surname = surname
...    	  self.age = age
...    	  self.update({"name":name, "surname":surname, "age":age})
...
...    def changeName(self, new_name):
...    	  self.name = new_name
...    	  self.update({"name":new_name})
...

>>> me = User("Andres", "Iuliano", 25)
>>> me.changeName("Andrea")
>>> print(me)
{'age': 25, 'name': 'Andrea', 'surname': 'Iuliano'}
>>> print(me['age'])
25

Any

Questions?

The end, thank you!

github.com/Pausa90

andreaiuliano90@gmail.com

slides.com/pausa/deck/