Python's Data Structures: Basics and Advanced
PyRoma 2015
About Me
Full Name: Andrea Iuliano
Age: 25
Master Degree: Computer Science
At: UniRoma3
Passions:
- Computer Vision
- Videogames
- Animals (reptiles lover :D)
Contacs:
- github.com/Pausa90
- andreaiuliano90@gmail.com
Computer Vision Engineer
"if I can see it, It can see it"
More data structures?
We already have a general purpose data structures which providing a containing functions:
- list
- tuple
- set
- frozenset
- dict
List
The most basic data structure offered by Python.
Each element is stored into a sequence space and it can be accessed by an index (eg. an integer number).
list1 = ['physics', 'chemistry', 1997, 2000];
list2 = [1, 2, 3, 4, 5 ];
list3 = ["a", "b", "c", "d"];
List
Also they can be used as a stack structure, offering in an easily way LIFO accessing.
>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
Tuple
An immutable collection of elements. You cannot update or change the value of tuple's elements or its size.
>>> tupl = (1, 2, 3, 4, 5, 6, 7 )
>>> tupl[0]
1
>>> tupl[0] = 10
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-947ebd095daf> in <module>()
----> 1 tupl[0] = 10
1
TypeError: 'tuple' object does not support item assignment
Set
It is an unordered collection with no duplicated elements. It supports mathematical operations like union, intersection and difference.
>>> set1 = set(range(0,5))
>>> set2 = set(range(3,8))
>>> set1
{0, 1, 2, 3, 4}
>>> set2
{3, 4, 5, 6, 7}
>>> set1.union(set2)
{0, 1, 2, 3, 4, 5, 6, 7}
>>> set1.intersection(set2)
{3, 4}
>>> set1.difference(set2)
{0, 1, 2}
Text
Frozenset
Same as set with an immutable feature
>>> frozen_set = frozenset(range(0,5))
>>> frozen_set
frozenset({0, 1, 2, 3, 4})
>>> frozen_set.
frozen_set.copy frozen_set.issubset
frozen_set.difference frozen_set.issuperset
frozen_set.intersection frozen_set.symmetric_difference
frozen_set.isdisjoint frozen_set.union
Dict
A dictionary is an unordered set of (key : value) pairs, with a unique key value.
>>> dict = {'Name': 'Valerio', 'Age': 25, 'Name': 'Andrea'};
>>> dict
{'Age': 25, 'Name': 'Andrea'}
>>> dict['Name'];
'Andrea'
>>> dict.keys()
['Age', 'Name']
>>> dict.values()
[25, 'Andrea']
Advanced Collections
" So why we need it? We have a useful primitive data structures, that we can combine as we like it! "
The correct answer is:
- code reuse
- no bugs "guaranteed"
- high-performance guaranteed
Collections
Introduced with Python2:
- Counter
- deque
- defaultdict
- namedtuple
- OrderedDict
Introduced with Python3:
- ChainMap
- UserDict
- UserList
- UserString
Counter
It is a dict subclass which can easily count hashable objects.
>>> char_counter = Counter("Hi, it works even with strings")
>>> char_counter
Counter({' ': 5, 'i': 4, 's': 3, 't': 3, 'e': 2, 'n': 2, 'r': 2, 'w': 2,
'g': 1, 'H': 1, 'k': 1, ',': 1, 'o': 1, 'v': 1, 'h': 1})
>>> string_counter = Counter(["Hi", ",", "it", "works", "even", "with",
"strings"])
>>> string_counter
Counter({'even': 1, ',': 1, 'it': 1, 'Hi': 1, 'works': 1, 'with': 1,
'strings': 1})
>>> string_counter.elements()
<itertools.chain at 0x7fe9b37ebb10>
>>> list(string_counter.elements())
['even', ',', 'it', 'Hi', 'works', 'with', 'strings']
Counter
>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]
>>> # Some mathematicals examples
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d # intersection: min(c[x], d[x])
Counter({'a': 1, 'b': 1})
>>> c | d # union: max(c[x], d[x])
Counter({'a': 3, 'b': 2})
Deque
A generalization of stacks and queues.
Advantages:
- thread-safe
- memory efficients appends and pops (both sides) with approximately O(1)
- Can create infinite size deque or set a maxlen value. If it is setted and if the deque is full, the newest element pop out the oldest in the other side
Deque
>>> deq = deque([], 3)
>>> deq.append(1)
>>> deq
deque([1], maxlen=3)
>>> deq.append(2)
>>> deq.append(3)
>>> deq
deque([1, 2, 3], maxlen=3)
>>> deq.append(4)
>>> deq
deque([2, 3, 4], maxlen=3)
>>> deq.appendleft(5)
>>> deq
deque([5, 2, 3], maxlen=3)
Deque
>>> #Rotations
>>> deq = deque([5, 2, 3], maxlen=3)
>>> deq.rotate(1)
>>> deq
deque([3, 5, 2], maxlen=3)
>>> deq.rotate(2)
>>> deq
deque([5, 2, 3], maxlen=3)
>>> deq.rotate(-3)
>>> deq
deque([5, 2, 3], maxlen=3)
>>> # Linux Tail Implementation
>>> def tail(filename, n):
... 'Return the last n lines of a file'
... with open(filename) as f:
... return deque(f, n)
Deque
>>> def moving_average(iterable, n=3):
... # Get an iterator from iterable object
... it = iter(iterable)
...
... # Create a deque as a n-size window
... deq = deque(itertools.islice(it, n-1))
... deq.appendleft(0)
... tot = sum(deq)
...
... # Iterate the list and get the mean for each window
... for elem in it:
... popped = deq.popleft()
... tot += elem - popped
... deq.append(elem)
... yield tot / float(n)
>>> raw_list = map(lambda x: x*5, range(0,10))
>>> print "input: ", raw_list
input: [0, 5, 10, 15, 20, 25, 30, 35, 40, 45]
>>> average = moving_average(raw_list)
>>> print "output: ", list(average)
output: [5.0, 10.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0]
Deque
Defaultdict
It is a dictionary like an object which provides all methods provided by dictionary with less code and the same speed.
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])]
Defaultdict
>>> #Dict of random int
>>> rand_dict = defaultdict(lambda: randint(0,3))
>>> rand_dict
defaultdict(<function <lambda> at 0x7f9254f0b1b8>, {})
>>> for num in range(5):
... rand_dict[num]
>>> rand_dict
defaultdict(<function <lambda> at 0x7f9254f0b1b8>, {0: 1, 1: 1, 2: 0, 3: 2, 4: 1})
Namedtuple
A factory of hybrid "tuple-dictionary" object: it returns a tuple charaterized by:
- name
- accessing by name and by indexing
- does not have real keys -> no hashability issues
- immutability values
Namedtuple
>>> Point = namedtuple('Point', ['x', 'y']) # Defining the namedtuple
>>> p = Point(10, y=20) # Creating an object
>>> p
Point(x=10, y=20)
>>> p.x + p.y
30
>>> p[0] + p[1] # Accessing the values in normal way
30
>>> x, y = p # Unpacking the tuple
>>> x
10
>>> y
20
Namedtuple: in depth
>>> Point = namedtuple('Point', ['x', 'y'], verbose=True)
class Point(tuple):
'Point(x, y)'
__slots__ = ()
_fields = ('x', 'y')
def __new__(_cls, x, y):
'Create a new instance of Point(x, y)'
return _tuple.__new__(_cls, (x, y))
@classmethod
def _make(cls, iterable, new=tuple.__new__, len=len):
'Make a new Point object from a sequence or iterable'
result = new(cls, iterable)
if len(result) != 2:
raise TypeError('Expected 2 arguments, got %d' % len(result))
return result
def __repr__(self):
'Return a nicely formatted representation string'
return 'Point(x=%r, y=%r)' % self
def _asdict(self):
'Return a new OrderedDict which maps field names to their values'
return OrderedDict(zip(self._fields, self))
def _replace(_self, **kwds):
'Return a new Point object replacing specified fields with new values'
result = _self._make(map(kwds.pop, ('x', 'y'), _self))
if kwds:
raise ValueError('Got unexpected field names: %r' % kwds.keys())
return result
def __getnewargs__(self):
'Return self as a plain tuple. Used by copy and pickle.'
return tuple(self)
__dict__ = _property(_asdict)
def __getstate__(self):
'Exclude the OrderedDict from pickling'
pass
x = _property(_itemgetter(0), doc='Alias for field number 0')
y = _property(_itemgetter(1), doc='Alias for field number 1')
OrderedDict
A regular dictionary which is able to remember the order that items were inserted
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
>>> # dictionary sorted by value
>>> OrderedDict(sorted(d.items(), key=lambda t: t[1]))
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
Collections ABC
Collections offers many Abstract Base Classes to entice the use of these data structures, even if you want to build your own collection.
class collections.Container
class collections.Hashable
class collections.Sized
class collections.Callable
class collections.Iterable
class collections.Iterator
class collections.Sequence
class collections.MutableSequence
class collections.Set
class collections.MutableSet
class collections.Mapping
class collections.MutableMapping
class collections.MappingView
class collections.ItemsView
class collections.KeysView
class collections.ValuesView
ChainMap
It provides a quickly linking between many mappings, so they can be treated as a single unit.
Features:
- the mappings are stored in a public and updatable list (there are no other states)
- writes, updates and deletions only operate on the first mapping
- the underlying mapping is stored by references -> auto updating mapping
ChainMap
>>> dict1 = { "Name": "Andrea", "Surname": "Iuliano", "Age": 25 }
>>> dict2 = { "Name": "Valerio", "Surname": "Rossi", \
"Hobbies": ["skate", "snowboard", "reptiles", "videogames"] }
>>> chain = ChainMap(dict1, dict2)
>>> chain
ChainMap({'Name': 'Andrea', 'Age': 25, 'Surname': 'Iuliano'}, \
{'Name': 'Andrea', 'Surname': 'Iuliano', \
'Hobbies': ['skate', 'snowboard', 'reptiles', 'videogames']})
>>> chain['Name']
'Andrea'
>>> chain['Hobbies']
['skate', 'snowboard', 'reptiles', 'videogames']
UserDict, UserList e UserString
These three classes are wrappers around dictionary/list/string objects and allow you to treat it as a corrisponding class.
To get these functionality it needs to:
- create a subclass from UserDict/UserList/UserString
- declare self.data as a correspondent empty type
- invoke self.update(args) to update all the changes in self.data
UserDict
>> class User(UserDict):
...
... def __init__(self, name, surname, age):
... self.data = {}
... self.name = name
... self.surname = surname
... self.age = age
... self.update({"name":name, "surname":surname, "age":age})
...
... def changeName(self, new_name):
... self.name = new_name
... self.update({"name":new_name})
...
>>> me = User("Andres", "Iuliano", 25)
>>> me.changeName("Andrea")
>>> print(me)
{'age': 25, 'name': 'Andrea', 'surname': 'Iuliano'}
>>> print(me['age'])
25
Any
Questions?
The end, thank you!
github.com/Pausa90
andreaiuliano90@gmail.com
slides.com/pausa/deck/
Python's Data Structures: Basics and Advanced
By Andrea Iuliano
Python's Data Structures: Basics and Advanced
- 747