Generators in Python

Arfat Salman
Software Engineer at Pesto Tech

@salman_arfat

 

Presented at PyDelhi, Gurgaon

Goals

  • Take a look at Python Generator Functions
  • It's a powerful feature that is often not utilised to its fullest
  • See how many constructs in python use generators
  • See a few example implementations
  • Have fun!

Disclaimer

  • This is not an exhaustive tutorial. We are not going to learn every possible use cases.
  • Instead, we are going to develop an intuitive understanding of where generators are useful and see some examples.
  • Consult Python Docs and books such as Learning Python (5E) by Mark Lutz.

Iterables and Iterators

  • An iterable is an object that has an __iter__ method which returns an iterator.
  • Iterators are objects with __next__ defines. They
    • return the next value in the iteration
    • remember the state during iteration. 
    • update the state to point at the next value
    • signal when it is done by raising StopIteration
  • Iterables can also defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid).
class Range(object):
    def __init__(self, low, high):
        self.start = low
        self.end = high
    def __iter__(self):
        class Iterator(object):
            def __init__(self, low, high):
                self.start = low
                self.end = high
                self.current = low
            def __next__(self):
                if self.current >= self.end:
                    raise StopIteration
                val = self.current
                self.current += 1
                return val
        
        return Iterator(self.start, self.end)

my_range = Range(2, 10)
range_iterator = iter(my_range)

print('Using iterator and next - ')
print(next(range_iterator)) # 2
print(next(range_iterator)) # 3
print(next(range_iterator)) # 4

print('Using iterable - ')
for val in my_range:
    print(val, end=' ')

# 2 3 4 5 6 7 8 9 

Iteration Protocol

class Range(object):
    def __init__(self, low, high):
        self.start = low
        self.end = high
        self.current = low

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.end:
            raise StopIteration
        val = self.current
        self.current += 1
        return val

my_range = Range(2,10)
range_iterator = iter(my_range)

print('Using iterator and next - ')
print(next(range_iterator)) # 2
print(next(range_iterator)) # 3
print(next(range_iterator)) # 4

print('Using iterable - ')
for val in my_range:
    print(val, end=' ')

# 5 6 7 8 9 

Iteration

  • Iteration is the process of applying iterators and getting sequential values.
  • Python's for loop uses iterators internally, if available.
  • An example - 
for elm in [1, 2, 'a', 'b']:
     print(elm)

# Results
1
2
a
b

Iteration in Python

  • Iterating over a Dict
details = {
    'first_name': 'Arfat',
    'last_name': 'Salman',
    'twitter_handle': '@salman_arfat'
}

for key in details:
    print(key, end=' ')

# first_name last_name twitter_handle
  • Iterating over a String
for key in 'abcdefghi':
    print(key, end=' ')

# a b c d e f g h i 
  • Iterating over a file
for line in open('text.txt'):
    print(line)

# A line printed here

More Iterables Consumers

  • Many functions consume iterables
  • Reductions
    • sum(iter)
    • min(iter)
    • max(iter)
  • Constructors
    • list(iter)
    • tuple(iter)
    • dict(iter)
    • set(iter)
  • in operator
  • other libraries such as itertools
  • Sequence unpacking
a,b,c = iter([1,2,3])
  • Note that you can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it.

Generators

  • Generators are a special class of functions that simplify the task of writing iterators.
  • A generator is a function that produces a sequence of results instead of a single value, i.e you generate ​a series of values.
  • A generator is simply a function which returns an object on which you can call next(), such that for every call it returns some value, until it raises a StopIteration exception.
  • Such an object is called an iterator.

Making Generators

def Range(start, end):
    current = start
    while current < end:
        yield current
        current += 1
    raise StopIteration

my_range_generator = Range(2,10)
range_iterator = iter(my_range_generator)

print(Range)
print(my_range_generator)
print('range_iterator is my_range_generator? - ',
    range_iterator is my_range_generator)

for val in my_range_generator:
    print(val, end=' ')

# <function Range at 0x10f6100d0>
# <generator object Range at 0x10f78bf68>
# range_iterator is my_range_generator? -  True
# 2 3 4 5 6 7 8 9 
  • Use yield in a normal function

Generator Expressions

generating_func = (n for n in range(3, 9) if n > 5)

print(generating_func)
print(next(generating_func))
print(next(generating_func))
print(next(generating_func))

# <generator object <genexpr> at 0x102a0af68>
# 6
# 7
# 8

 ( expression for expr in sequence1 if condition1 ...

             for exprN in sequenceN

             if conditionN )  

Why Generators?

  • When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced.

Generators vs Iterators

  • A generator function is slightly different than an object that supports iteration.
  • A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.
  • This is different than a list (which you can iterate over as many times as you want )

Advantages of Generators

  • Generators make lazy evaluation possible.
  • Generators are good for calculating large sets of results where you don't know if you are going to need all results.
  • Generators can be used to replace callbacks with iteration. You may occasionally report back to the caller. With generators, you yield when you want to report.
  • It is memory efficient since all the data need not be generated at once.
  • A non-obvious use of generator lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads. [PEP 255]

Examples

def fib():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

import itertools
list(itertools.islice(fib(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
def fib(n):
    a = b = 1
    result = []
    for i in xrange(n):
        result.append(a)
        a, b = b, a + b
    return result

When not to use Generators?

Use a list instead of a generator when:

  • You need to access the data multiple times (i.e. cache the results instead of recomputing them)

 

for i in outer:           # used once, okay to be a generator or return a list
    for j in inner:       # used multiple times, reusing a list is better
         ...
for i in reversed(data): ...     # generators aren't reversible

s[i], s[j] = s[j], s[i]          # generators aren't indexable
  • You need random access (or any access other than forward sequential order)
Premature optimization is the root of all evil in programming. - Knuth
  • You need to join strings (which requires two passes over the data)
  • You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.
s = ''.join(data) # lists are faster than generators in this use case
# str.join makes one pass to add-up the lengths of all the string fragments
# so it knows much memory to allocate for the combined final result.
# The second pass copies the string fragments into in 
# the new buffer to create a single new string.
# If the input to join isn't a list, it has to
# do extra work to build a temporary list for the two passes. 

Implementations

def cycle(iterable):
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    saved = list(iterable)
    while saved:
        for element in saved:
             yield element

def dropwhile(predicate, iterable):
    # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

Thanks

Please feel free to contact me.

@salman_arfat

Generators in Python

By Arfat Salman

Generators in Python

A conceptual overview of generator functions, and how they relate to iterators in Python 3.

  • 534