Elixir 1.6
Chapter 10
Processing Collections—Enum and Stream
Collections
Elixir comes with a number of types that act as collections: lists, maps, ranges, files and even functions.
Collections share something:
- you can iterate through them
- you can add things to them (sometimes)
Things that can be iterated are said to implement the Enumerable protocol.
Elixir provides two modules
that have a bunch of iteration functions:
- Enum module
- Stream Module
Enum—Processing Collections
The Enum module is probably the most used of all the Elixir libraries. Employ it to iterate, filter, combine, split, and otherwise manipulate collections
# Convert any collection into a list:
iex> list = Enum.to_list 1..5
[1, 2, 3, 4, 5]
# Concatenate collections:
iex> Enum.concat([1,2,3], [4,5,6])
[1, 2, 3, 4, 5, 6]
iex> Enum.concat [1,2,3], 'abc'
[1, 2, 3, 97, 98, 99]
# Create collections whose elements are some function of the original:
iex> Enum.map(list, &(&1 * 10))
[10, 20, 30, 40, 50]
iex> Enum.map(list, &String.duplicate("*", &1))
["*", "**", "***", "****", "*****"]
Enum—Processing Collections
# Select elements by position or criteria:
iex> Enum.at(10..20, 3)
13
iex> Enum.at(10..20, 20)
nil
iex> Enum.at(10..20, 20, :no_one_here)
:no_one_here
iex> Enum.filter(list, &(&1 > 2))
[3, 4, 5]
iex> require Integer # to get access to is_even
nil
iex> Enum.filter(list, &Integer.is_even/1)
[2, 4]
iex> Enum.reject(list, &Integer.is_even/1)
[1, 3, 5]
# Sort and compare elements:
iex> Enum.sort ["there", "was", "a", "crooked", "man"]
["a", "crooked", "man", "there", "was"]
iex> Enum.sort ["there", "was", "a", "crooked", "man"],
...> &(String.length(&1) <= String.length(&2))
["a", "was", "man", "there", "crooked"]
iex(4)> Enum.max ["there", "was", "a", "crooked", "man"]
"was"
iex(5)> Enum.max_by ["there", "was", "a", "crooked", "man"], &String.length/1
"crooked"
A Note on Sorting
iex> Enum.sort ["there", "was", "a", "crooked", "man"],
...> &(String.length(&1) <= String.length(&2))
In our example of sort, we used:
It’s important to use <= and not just < if you want the sort to be stable.
Enum—Processing Collections
# Merge collections:
iex> Enum.zip(list, [:a, :b, :c])
[{1, :a}, {2, :b}, {3, :c}]
iex> Enum.with_index(["once", "upon", "a", "time"])
[{"once", 0}, {"upon", 1}, {"a", 2}, {"time", 3}]
# Fold elements into a single value:
iex> Enum.reduce(1..100, &(&1+&2))
5050
iex> Enum.reduce(["now", "is", "the", "time"],fn word, longest ->
...> if String.length(word) > String.length(longest) do
...> word
...> else
...> longest
...> end
...> end)
"time"
iex> Enum.reduce(["now", "is", "the", "time"], 0, fn word, longest ->
...> if String.length(word) > longest,
...> do: String.length(word),
...> else: longest
...> end)
4
Enum—Processing Collections
# Deal a hand of cards:
iex> import Enum
iex> deck = for rank <- '23456789TJQKA', suit <- 'CDHS', do: [suit,rank]
['C2', 'D2', 'H2', 'S2', 'C3', 'D3', ... ]
iex> deck |> shuffle |> take(13)
['DQ', 'S6', 'HJ', 'H4', 'C7', 'D6', 'SJ', 'S9', 'D7', 'HA', 'S4', 'C2', 'CT']
iex> hands = deck |> shuffle |> chunk(13)
[['D8', 'CQ', 'H2', 'H3', 'HK', 'H9', 'DK', 'S9', 'CT', 'ST', 'SK', 'D2', 'HA'],
['C5', 'S3', 'CK', 'HQ', 'D3', 'D4', 'CA', 'C8', 'S6', 'DQ', 'H5', 'S2', 'C4'],
['C7', 'C6', 'C2', 'D6', 'D7', 'SA', 'SQ', 'H8', 'DT', 'C3', 'H7', 'DA', 'HT'],
['S5', 'S4', 'C9', 'S8', 'D5', 'H4', 'S7', 'SJ', 'HJ', 'D9', 'DJ', 'CJ', 'H6']]
Streams—Lazy Enumerables
[ 1, 2, 3, 4, 5 ]
|> Enum.map(&(&1*&1))
|> Enum.with_index
|> Enum.map(fn {value, index} -> value - index end)
|> IO.inspect #=> [1,3,7,13,21]
The Enum module is greedy. When you pass it a collection, it potentially consumes all the contents of that collection. It also means the result will typically be another collection.
So, this pipeline generates four lists on its way to outputting the final result.
Streams—Lazy Enumerables
IO.puts File.read!("/usr/share/dict/words")
|> String.split
|> Enum.max_by(&String.length/1)
In this case, we read the whole dictionary into memory (on my machine that’s 2.4MB), then split into a list of words (236,000 of them) before processing it to find the longest (which happens to be formaldehydesulphoxylate).
What we really want is to process the elements in the collection as we need them.
A Stream Is a Composable Enumerator
iex> s = Stream.map [1, 3, 5, 7], &(&1 + 1)
#Stream<[enum: [1, 3, 5, 7], funs: [#Function<37.75994740/1 in Stream.map/2>] ]>
If we’d called Enum.map, we’d have seen the result [2,4,6,8] come back immediately.
Instead we get back a stream value that contains a specification of what we intended.
Streams
iex> s = Stream.map [1, 3, 5, 7], &(&1 + 1)
#Stream<...>
iex> Enum.to_list s
[2, 4, 6, 8]
How do we get the stream to start giving us results? Treat it as a collection and pass it to a function in the Enum module:
Composable Streams
iex> squares = Stream.map [1, 2, 3, 4], &(&1*&1)
#Stream<[enum: [1, 2, 3, 4],
funs: [#Function<32.133702391 in Stream.map/2>] ]>
iex> plus_ones = Stream.map squares, &(&1+1)
#Stream<[enum: [1, 2, 3, 4],
funs: [#Function<32.133702391 in Stream.map/2>,
#Function<32.133702391 in Stream.map/2>] ]>
iex> odds = Stream.filter plus_ones, fn x -> rem(x,2) == 1 end
#Stream<[enum: [1, 2, 3, 4],
funs: [#Function<26.133702391 in Stream.filter/2>,
#Function<32.133702391 in Stream.map/2>,
#Function<32.133702391 in Stream.map/2>] ]>
iex> Enum.to_list odds
[5, 17]
Because streams are enumerable, you can also pass a stream to a stream function. Because of this, we say that streams are composable.
Streams
[1,2,3,4]
|> Stream.map(&(&1*&1))
|> Stream.map(&(&1+1))
|> Stream.filter(fn x -> rem(x,2) == 1 end)
|> Enum.to_list
In real life we would write this:
We’re never creating intermediate lists—we’re just passing successive elements of each of the collections to the next in the chain.
Streams
IO.puts File.open!("/usr/share/dict/words")
|> IO.stream(:line)
|> Enum.max_by(&String.length/1)
# there's a shortcut
IO.puts File.stream!("/usr/share/dict/words") |> Enum.max_by(&String.length/1)
Streams aren’t only for lists. More and more Elixir modules now support streams. Here’s our longest-word code written using streams:
The magic here is the call to IO.stream, which converts an IO device (in this case the open file) into a stream that serves one line at a time.
The good news is that there is no intermediate storage. The bad news is that it runs about two times slower than the previous version.
Infinite Streams
iex> Enum.map(1..10_000_000, &(&1+1)) |> Enum.take(5)
[2, 3, 4, 5, 6]
# takes about 8 seconds
iex> Stream.map(1..10_000_000, &(&1+1)) |> Enum.take(5)
[2, 3, 4, 5, 6]
# the result comes back instantaneously
Because streams are lazy, there’s no need for the whole collection to be available up front.
Creating Your Own Streams
Streams are implemented solely in Elixir libraries—there is no specific runtime support.
The actual implementation is complex so you probably want to use some helpful wrapper functions to do the heavy lifting. There are a number of these, including
- cycle
- repeatedly
- iterate
- unfold
- resource
Stream.cycle
Stream.cycle takes an enumerable and returns an infinite stream containing that enumerable’s elements. When it gets to the end, it repeats from the beginning, indefinitely.
iex> Stream.cycle(~w{ green white }) |>
...> Stream.zip(1..5) |>
...> Enum.map(fn {class, value} ->
...> ~s{<tr class="#{class}"><td>#{value}</td></tr>\n} end) |>
...> IO.puts
<tr class="green"><td>1</td></tr>
<tr class="white"><td>2</td></tr>
<tr class="green"><td>3</td></tr>
<tr class="white"><td>4</td></tr>
<tr class="green"><td>5</td></tr>
Stream.repeatedly
Stream.repeatedly takes a function and invokes it each time a new value is wanted.
iex> Stream.repeatedly(fn -> true end) |> Enum.take(3)
[true, true, true]
iex> Stream.repeatedly(&:random.uniform/0) |> Enum.take(3)
[0.7230402056221108, 0.94581636451987, 0.5014907142064751]
Stream.iterate
Stream.iterate(start_value, next_fun) generates an infinite stream. The first value is start_value. The next value is generated by applying next_fun to this value. This continues for as long as the stream is being used, with each value being the result of applying next_fun to the previous value.
iex> Stream.iterate(0, &(&1+1)) |> Enum.take(5)
[0, 1, 2, 3, 4]
iex> Stream.iterate(2, &(&1*&1)) |> Enum.take(5)
[2, 4, 16, 256, 65536]
iex> Stream.iterate([], &[&1]) |> Enum.take(5)
[[], [[]], [[[]]], [[[[]]]], [[[[[]]]]]]
Stream.unfold
Stream.unfold is related to iterate, but you can be more explicit both about the values output to the stream and about the values passed to the next iteration.
Its general form is
fn state -> { stream_value, new_state } end
iex> Stream.unfold({0,1}, fn {f1,f2} -> {f1, {f2, f1+f2}} end)
|> Enum.take(15)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]
Stream.unfold
The function uses the argument to create two values, returned as a tuple.
The first is the value to be returned by this iteration of the stream, and the second is the value to be passed to the function on the next iteration of the stream
iex> Stream.unfold({0,1}, fn {f1,f2} -> {f1, {f2, f1+f2}} end)
|> Enum.take(15)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]
Stream.unfold (cont.)
Here the state is a tuple containing the current and the next number in the sequence. We seed it with the initial state of {0, 1}. The value each iteration of the stream returns is the first of the state values. The new state moves one down the sequence, so an initial state of {f1,f2} becomes a new state of {f2,f1+f2}.
iex> Stream.unfold({0,1}, fn {f1,f2} -> {f1, {f2, f1+f2}} end)
|> Enum.take(15)
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]
Stream.resource
How would you implement yourself something like File.stream? We use Stream.resource
Stream.resource(fn -> File.open!("sample") end,
fn file ->
case IO.read(file, :line) do
data when is_binary(data) -> {[data], file}
_ -> {:halt, file}
end
end,
fn file -> File.close(file) end)
There are 3 function params:
- Open the resource
- Read from the resource
- Close/Cleanup the resource
Stream.resource
Let's create a timer.
defmodule Countdown do
def sleep(seconds) do
receive do
after seconds*1000 -> nil
end
end
def say(text) do
spawn fn -> :os.cmd('say #{text}') end
end
def timer do
Stream.resource(
fn -> # the number of seconds to the start of the next minute
{_h,_m,s} = :erlang.time
60 - s - 1
end,
fn # wait for the next second, then return its countdown
0 ->
{:halt, 0}
count ->
sleep(1)
{ [inspect(count)], count - 1 }
end,
fn _ -> end # nothing to deallocate
)
end
end
Stream.resource
$ iex countdown.exs
iex> counter = Countdown.timer
#Function<17.133702391 in Stream.resource/3>
iex> printer = counter |> Stream.each(&IO.puts/1)
#Stream[enum: #Function<17.133702391 in Stream.resource/3>,
funs: [#Function<0.133702391 in Stream.each/2>] ]>
iex> speaker = printer |> Stream.each(&Countdown.say/1)
#Stream[enum: #Function<17.133702391 in Stream.resource/3>,
funs: [#Function<0.133702391 in Stream.each/2>,
#Function<0.133702391 in Stream.each/2>] ]>
iex> speaker |> Enum.take(5)
37 ** numbers are output once
36 ** per second. Even cooler,the
35 ** computer says
34 ** "thirty seven", "thirty six"…
33
["37", "36", "35", "34", "33"]
# wait a few seconds
iex> speaker |> Enum.take(3)
29
28
27
["29", "28", "27"]
Streams in Practice
In the same way that functional programming requires you to look at problems in a new way, streams ask you to look at iteration and collections afresh.
Consider using a stream when you want to defer processing until you need the data,
and when you need to deal with large numbers of things without necessarily generating them all at once.
The Collectable Protocol
The Enumerable protocol lets you iterate over the elements collection.
Collectable is in some sense the opposite—it allows you to build a collection by inserting elements into it.
The collectable API is pretty low-level, so you’ll typically access it via Enum.into.
iex> Enum.into 1..5, []
[1, 2, 3, 4, 5]”
iex> Enum.into 1..5, [100, 101 ]
[100, 101, 1, 2, 3, 4, 5]
# output streams are collectable
iex> Enum.into IO.stream(:stdio, :line), IO.stream(:stdio, :line)
Comprehensions
When you’re writing functional code, you often map and filter collections of things. Elixir provides a general-purpose shortcut for this: the comprehension.
The idea of a comprehension is fairly simple:
- given one or more collections
- extract all combinations of values from each
- optionally filter the values
- then generate a new collection using the values that remain
Comprehensions
The general syntax for comprehensions is deceptively simple:
result = for generator or filter… [, into: value ], do: expression
iex> for x <- [ 1, 2, 3, 4, 5 ], do: x * x
[1, 4, 9, 16, 25]
iex> for x <- [ 1, 2, 3, 4, 5 ], x < 4, do: x * x
[1, 4, 9]
A generator specifies how you want to extract values from a collection.
pattern <- enumerable_thing
Any variables matched in the pattern are available in the rest of the comprehension.
Comprehensions
For example, x <- [1,2,3] says that we want to first run the rest of the comprehension with x set to 1.
Then we run it with x set to 2, and so on.
If we have two generators, their operations are nested, so
x <- [1,2], y <- [5,6]
will run the rest of the comprehension with
x=1, y=5; x=1, y=6; x=2, y=5; and x=2, y=6.
iex> for x <- [1,2], y <- [5,6], do: x * y
[5, 6, 10, 12]
iex> for x <- [1,2], y <- [5,6], do: {x, y}
[{1, 5}, {1, 6}, {2, 5}, {2, 6}]
Comprehensions
You can use variables from generators in later generators:
iex> min_maxes = [{1,4}, {2,3}, {10, 15}]
[{1, 4}, {2, 3}, {10, 15}]
iex> for {min,max} <- min_maxes, n <- min..max, do: n
[1, 2, 3, 4, 2, 3, 10, 11, 12, 13, 14, 15]
n <- min..max is a filter.
A filter acts as a gatekeeper for the rest of the comprehension—if the condition is false, then the comprehension moves on to the next iteration without generating an output value.
Comprehensions
This code uses a comprehension to list pairs of numbers from 1 to 8 whose product is a multiple of 10.
iex> first8 = [ 1,2,3,4,5,6,7,8 ]
[1, 2, 3, 4, 5, 6, 7, 8]
iex> for x <- first8, y <- first8, x >= y, rem(x*y, 10)==0, do: { x, y }
[{5, 2}, {5, 4}, {6, 5}, {8, 5}]
It uses two generators (to cycle through the pairs of numbers) and two filters.
The first filter x >= y allows only pairs in which the first number is at least the value of the second.
The second filter x >= y checks to see if the product is a multiple of 10.
Comprehensions
iex> first8 = [ 1,2,3,4,5,6,7,8 ]
[1, 2, 3, 4, 5, 6, 7, 8]
iex> for x <- first8, y <- first8, x >= y, rem(x*y, 10)==0, do: { x, y }
[{5, 2}, {5, 4}, {6, 5}, {8, 5}]
This comprehension iterates 64 times,
with x=1, y=1; x=1, y=2; and so on.
However, the first filter cuts the iteration short
when x is less than y.
This means the second filter runs only 36 times.
Comprehensions
iex> reports = [ dallas: :hot, minneapolis: :cold, dc: :muggy, la: :smoggy ]
[dallas: :hot, minneapolis: :cold, dc: :muggy, la: :smoggy]
iex> for { city, weather } <- reports, do: { weather, city }
[hot: :dallas, cold: :minneapolis, muggy: :dc, smoggy: :la]
Because the first term in a generator is a pattern, we can use it to deconstruct structured data.
Here’s a comprehension that swaps the keys and values in a keyword list.
Bit Comprehensions
# generator is enclosed in << >>
iex> for << ch <- "hello" >>, do: ch
'hello'
# convert back into a string
iex> for << ch <- "hello" >>, do: <<ch>>
["h", "e", "l", "l", "o"]
# convert to octal
iex> for << << b1::size(2), b2::size(3), b3::size(3) >> <- "hello" >>,
...> do: "0#{b1}#{b2}#{b3}"
["0150", "0145", "0154", "0154", "0157"]
A bitstring (and, by extension, a binary or a string) is simply a collection of ones and zeroes and allows comprehensions.
Scoping and Comprehensions
iex> name = "Dave"
"Dave"
iex> for name <- [ "cat", "dog" ], do: String.upcase(name)
["CAT", "DOG"]
iex> name
"Dave"
iex>
All variable assignments inside a comprehension are local to that comprehension—you will not affect the value of a variable in the outer scope.
The Value Returned by a Comprehension
iex> for x <- ~w{ cat dog }, into: %{}, do: { x, String.upcase(x) }
%{"cat" => "CAT", "dog" => "DOG"}
# use Map.new
iex> for x <- ~w{ cat dog }, into: Map.new, do: { x, String.upcase(x) }
%{"cat" => "CAT", "dog" => "DOG"}
# non empty collection
iex> for x <- ~w{ cat dog }, into: %{"ant" => "ANT"}, do: { x, String.upcase(x) }
%{"ant" => "ANT", "cat" => "CAT", "dog" => "DOG"}
# Collectable
iex> for x <- ~w{ cat dog }, into: IO.stream(:stdio,:line), do: "<<#{x}>>\n"
<<cat>>
<<dog>>
%IO.Stream{device: :standard_io, line_or_bytes: :line, raw: false}”
So far, the comprehension has returned a list using the do expression.
This can be changed with the into: parameter.
This takes a collection that is to receive the results of the comprehension
Thank you!
Programming Elixir 1.6 Chapter 10
By Dustin McCraw
Programming Elixir 1.6 Chapter 10
- 1,279