Generators in Python
Arfat Salman
Software Engineer at Pesto Tech
@salman_arfat
Presented at PyDelhi, Gurgaon
Goals
- Take a look at Python Generator Functions
- It's a powerful feature that is often not utilised to its fullest
- See how many constructs in python use generators
- See a few example implementations
- Have fun!
Disclaimer
- This is not an exhaustive tutorial. We are not going to learn every possible use cases.
- Instead, we are going to develop an intuitive understanding of where generators are useful and see some examples.
- Consult Python Docs and books such as Learning Python (5E) by Mark Lutz.
Iterables and Iterators
- An iterable is an object that has an __iter__ method which returns an iterator.
- Iterators are objects with __next__ defines. They
- return the next value in the iteration
- remember the state during iteration.
- update the state to point at the next value
- signal when it is done by raising StopIteration
- Iterables can also defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid).
class Range(object):
def __init__(self, low, high):
self.start = low
self.end = high
def __iter__(self):
class Iterator(object):
def __init__(self, low, high):
self.start = low
self.end = high
self.current = low
def __next__(self):
if self.current >= self.end:
raise StopIteration
val = self.current
self.current += 1
return val
return Iterator(self.start, self.end)
my_range = Range(2, 10)
range_iterator = iter(my_range)
print('Using iterator and next - ')
print(next(range_iterator)) # 2
print(next(range_iterator)) # 3
print(next(range_iterator)) # 4
print('Using iterable - ')
for val in my_range:
print(val, end=' ')
# 2 3 4 5 6 7 8 9
Iteration Protocol
class Range(object):
def __init__(self, low, high):
self.start = low
self.end = high
self.current = low
def __iter__(self):
return self
def __next__(self):
if self.current >= self.end:
raise StopIteration
val = self.current
self.current += 1
return val
my_range = Range(2,10)
range_iterator = iter(my_range)
print('Using iterator and next - ')
print(next(range_iterator)) # 2
print(next(range_iterator)) # 3
print(next(range_iterator)) # 4
print('Using iterable - ')
for val in my_range:
print(val, end=' ')
# 5 6 7 8 9
Iteration
- Iteration is the process of applying iterators and getting sequential values.
- Python's for loop uses iterators internally, if available.
- An example -
for elm in [1, 2, 'a', 'b']:
print(elm)
# Results
1
2
a
b
Iteration in Python
- Iterating over a Dict
details = {
'first_name': 'Arfat',
'last_name': 'Salman',
'twitter_handle': '@salman_arfat'
}
for key in details:
print(key, end=' ')
# first_name last_name twitter_handle
- Iterating over a String
for key in 'abcdefghi':
print(key, end=' ')
# a b c d e f g h i
- Iterating over a file
for line in open('text.txt'):
print(line)
# A line printed here
More Iterables Consumers
- Many functions consume iterables
- Reductions
- sum(iter)
- min(iter)
- max(iter)
- Constructors
- list(iter)
- tuple(iter)
- dict(iter)
- set(iter)
- in operator
- other libraries such as itertools
- Sequence unpacking
a,b,c = iter([1,2,3])
- Note that you can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it.
Generators
- Generators are a special class of functions that simplify the task of writing iterators.
- A generator is a function that produces a sequence of results instead of a single value, i.e you generate a series of values.
- A generator is simply a function which returns an object on which you can call next(), such that for every call it returns some value, until it raises a StopIteration exception.
- Such an object is called an iterator.
Making Generators
def Range(start, end):
current = start
while current < end:
yield current
current += 1
raise StopIteration
my_range_generator = Range(2,10)
range_iterator = iter(my_range_generator)
print(Range)
print(my_range_generator)
print('range_iterator is my_range_generator? - ',
range_iterator is my_range_generator)
for val in my_range_generator:
print(val, end=' ')
# <function Range at 0x10f6100d0>
# <generator object Range at 0x10f78bf68>
# range_iterator is my_range_generator? - True
# 2 3 4 5 6 7 8 9
- Use yield in a normal function
Generator Expressions
generating_func = (n for n in range(3, 9) if n > 5)
print(generating_func)
print(next(generating_func))
print(next(generating_func))
print(next(generating_func))
# <generator object <genexpr> at 0x102a0af68>
# 6
# 7
# 8
( expression for expr in sequence1 if condition1 ...
for exprN in sequenceN
if conditionN )
Why Generators?
- When a producer function has a hard enough job that it requires maintaining state between values produced, most programming languages offer no pleasant and efficient solution beyond adding a callback function to the producer's argument list, to be called with each value produced.
Generators vs Iterators
- A generator function is slightly different than an object that supports iteration.
- A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.
- This is different than a list (which you can iterate over as many times as you want )
Advantages of Generators
- Generators make lazy evaluation possible.
- Generators are good for calculating large sets of results where you don't know if you are going to need all results.
- Generators can be used to replace callbacks with iteration. You may occasionally report back to the caller. With generators, you yield when you want to report.
- It is memory efficient since all the data need not be generated at once.
- A non-obvious use of generator lets you do things like update UI or run several jobs "simultaneously" (interleaved, actually) while not using threads. [PEP 255]
Examples
def fib():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
import itertools
list(itertools.islice(fib(), 10))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
def fib(n):
a = b = 1
result = []
for i in xrange(n):
result.append(a)
a, b = b, a + b
return result
When not to use Generators?
Use a list instead of a generator when:
- You need to access the data multiple times (i.e. cache the results instead of recomputing them)
for i in outer: # used once, okay to be a generator or return a list
for j in inner: # used multiple times, reusing a list is better
...
for i in reversed(data): ... # generators aren't reversible
s[i], s[j] = s[j], s[i] # generators aren't indexable
- You need random access (or any access other than forward sequential order)
Premature optimization is the root of all evil in programming. - Knuth
- You need to join strings (which requires two passes over the data)
- You are using PyPy which sometimes can't optimize generator code as much as it can with normal function calls and list manipulations.
s = ''.join(data) # lists are faster than generators in this use case
# str.join makes one pass to add-up the lengths of all the string fragments
# so it knows much memory to allocate for the combined final result.
# The second pass copies the string fragments into in
# the new buffer to create a single new string.
# If the input to join isn't a list, it has to
# do extra work to build a temporary list for the two passes.
Implementations
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = list(iterable)
while saved:
for element in saved:
yield element
def dropwhile(predicate, iterable):
# dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
Thanks
Please feel free to contact me.
@salman_arfat
Generators in Python
By Arfat Salman
Generators in Python
A conceptual overview of generator functions, and how they relate to iterators in Python 3.
- 534