Elixir 1.6

Chapter 10 

Processing Collections—Enum and Stream

Collections

Elixir comes with a number of types that act as collections: lists, maps, ranges, files and even functions.

Collections share something:

  • you can iterate through them
  • you can add things to them (sometimes)

Things that can be iterated are said to implement the Enumerable protocol.

Elixir provides two modules
that have a bunch of iteration functions:

  1. Enum module
  2. Stream Module

 

Enum—Processing Collections

The Enum module is probably the most used of all the Elixir libraries. Employ it to iterate, filter, combine, split, and otherwise manipulate collections

 

# Convert any collection into a list:
​iex> list = Enum.to_list 1..5
​[1, 2, 3, 4, 5]

# Concatenate collections:
​iex> Enum.concat([1,2,3], [4,5,6])
​[1, 2, 3, 4, 5, 6]
​iex> Enum.concat [1,2,3], ​'abc'​
​[1, 2, 3, 97, 98, 99]

# Create collections whose elements are some function of the original:
​iex> Enum.map(list, &(&1 * 10))
​[10, 20, 30, 40, 50]
​iex> Enum.map(list, &String.duplicate(​"​​*"​, &1))
​[​"​​*"​, ​"​​**"​, ​"​​***"​, ​"​​****"​, ​"​​*****"​]

Enum—Processing Collections

# Select elements by position or criteria:
​iex> Enum.at(10..20, 3)
​13
​iex> Enum.at(10..20, 20)
​nil
​iex> Enum.at(10..20, 20, ​:no_one_here​)
​​:no_one_here​
​iex> Enum.filter(list, &(&1 > 2))
​[3, 4, 5]
​iex> ​require​ Integer     ​# to get access to is_even​
​nil
​iex> Enum.filter(list, &Integer.is_even/1)
​[2, 4]
​iex> Enum.reject(list, &Integer.is_even/1)
​[1, 3, 5]

# Sort and compare elements:
​iex> Enum.sort [​"​​there"​, ​"​​was"​, ​"​​a"​, ​"​​crooked"​, ​"​​man"​]
​[​"​​a"​, ​"​​crooked"​, ​"​​man"​, ​"​​there"​, ​"​​was"​]
​iex> Enum.sort [​"​​there"​, ​"​​was"​, ​"​​a"​, ​"​​crooked"​, ​"​​man"​],
​...>        &(String.length(&1) <= String.length(&2))
​[​"​​a"​, ​"​​was"​, ​"​​man"​, ​"​​there"​, ​"​​crooked"​]
​iex(4)> Enum.max [​"​​there"​, ​"​​was"​, ​"​​a"​, ​"​​crooked"​, ​"​​man"​]
​​"​​was"​
​iex(5)> Enum.max_by [​"​​there"​, ​"​​was"​, ​"​​a"​, ​"​​crooked"​, ​"​​man"​], &String.length/1
​​"​​crooked"​

A Note on Sorting

​iex>​ Enum.sort [​"​​there"​, ​"​​was"​, ​"​​a"​, ​"​​crooked"​, ​"​​man"​],
​...>​        &(String.length(&1) <= String.length(&2))

In our example of sort, we used:

It’s important to use <= and not just < if you want the sort to be stable.

Enum—Processing Collections

# Merge collections:
​iex> Enum.zip(list, [​:a​, ​:b​, ​:c​])
​[{1, ​:a​}, {2, ​:b​}, {3, ​:c​}]
​iex> Enum.with_index([​"​​once"​, ​"​​upon"​, ​"​​a"​, ​"​​time"​])
​[{​"​​once"​, 0}, {​"​​upon"​, 1}, {​"​​a"​, 2}, {​"​​time"​, 3}]

# Fold elements into a single value:
​iex> Enum.reduce(1..100, &(&1+&2))
​5050
​iex> Enum.reduce([​"​​now"​, ​"​​is"​, ​"​​the"​, ​"​​time"​],​fn​ word, longest ->
​...>        ​if​ String.length(word) > String.length(longest) ​do​
​...>           word
​...>        ​else​
​...>           longest
​...>        ​end​
​...> ​end​)
​​"​​time"​
​iex> Enum.reduce([​"​​now"​, ​"​​is"​, ​"​​the"​, ​"​​time"​], 0, ​fn​ word, longest ->
​...>        ​if​    String.length(word) > longest,
​...>        ​do​:   String.length(word),
​...>        ​else​: longest
​...> ​end​)
​4

Enum—Processing Collections

# Deal a hand of cards:
​iex> ​import​ Enum
​iex> deck = for rank <- ​'23456789TJQKA'​, suit <- ​'CDHS'​, ​do​: [suit,rank]
[​'C2'​, ​'D2'​, ​'H2'​, ​'S2'​, ​'C3'​, ​'D3'​, ... ]
​iex> deck |> shuffle |> take(13)
​[​'DQ'​, ​'S6'​, ​'HJ'​, ​'H4'​, ​'C7'​, ​'D6'​, ​'SJ'​, ​'S9'​, ​'D7'​, ​'HA'​, ​'S4'​, ​'C2'​, ​'CT'​]
​iex> hands = deck |> shuffle |> chunk(13)
​[[​'D8'​, ​'CQ'​, ​'H2'​, ​'H3'​, ​'HK'​, ​'H9'​, ​'DK'​, ​'S9'​, ​'CT'​, ​'ST'​, ​'SK'​, ​'D2'​, ​'HA'​],
​ [​'C5'​, ​'S3'​, ​'CK'​, ​'HQ'​, ​'D3'​, ​'D4'​, ​'CA'​, ​'C8'​, ​'S6'​, ​'DQ'​, ​'H5'​, ​'S2'​, ​'C4'​],
​ [​'C7'​, ​'C6'​, ​'C2'​, ​'D6'​, ​'D7'​, ​'SA'​, ​'SQ'​, ​'H8'​, ​'DT'​, ​'C3'​, ​'H7'​, ​'DA'​, ​'HT'​],
​ [​'S5'​, ​'S4'​, ​'C9'​, ​'S8'​, ​'D5'​, ​'H4'​, ​'S7'​, ​'SJ'​, ​'HJ'​, ​'D9'​, ​'DJ'​, ​'CJ'​, ​'H6'​]]

Streams—Lazy Enumerables

[ 1, 2, 3, 4, 5 ]
|> Enum.map(&(&1*&1))
|> Enum.with_index
|> Enum.map(fn {value, index} -> value - index end)
|> IO.inspect   #=> [1,3,7,13,21]

The Enum module is greedy. When you pass it a collection, it potentially consumes all the contents of that collection. It also means the result will typically be another collection.

So, this pipeline generates four lists on its way to outputting the final result.

Streams—Lazy Enumerables

IO.puts File.read!("/usr/share/dict/words")
        |> String.split
        |> Enum.max_by(&String.length/1)

In this case, we read the whole dictionary into memory (on my machine that’s 2.4MB), then split into a list of words (236,000 of them) before processing it to find the longest (which happens to be formaldehydesulphoxylate).

 

What we really want is to process the elements in the collection as we need them.

A Stream Is a Composable Enumerator

​iex>​ s = Stream.map [1, 3, 5, 7], &(&1 + 1)
​#Stream<[enum: [1, 3, 5, 7], funs: [#Function<37.75994740/1 in Stream.map/2>] ]>

If we’d called Enum.map, we’d have seen the result [2,4,6,8] come back immediately.
Instead we get back a stream value that contains a specification of what we intended.

 

Streams

​iex>​ s = Stream.map [1, 3, 5, 7], &(&1 + 1)
​#Stream<​...>​
​​iex>​ Enum.to_list s
​[2, 4, 6, 8]

How do we get the stream to start giving us results? Treat it as a collection and pass it to a function in the Enum module:

Composable Streams

​iex>​ squares = Stream.map [1, 2, 3, 4], &(&1*&1)
​#Stream<[enum: [1, 2, 3, 4],
​         funs: [#Function<32.133702391 in Stream.map/2>] ]>
​
​​iex>​ plus_ones = Stream.map squares, &(&1+1)
​#Stream<[enum: [1, 2, 3, 4],
​         funs: [#Function<32.133702391 in Stream.map/2>,
​                #Function<32.133702391 in Stream.map/2>] ]>
​
​​iex>​ odds = Stream.filter plus_ones, ​fn​ x -> rem(x,2) == 1 ​end​
​#Stream<[enum: [1, 2, 3, 4],
​         funs: [#Function<26.133702391 in Stream.filter/2>,
​                #Function<32.133702391 in Stream.map/2>,
​                #Function<32.133702391 in Stream.map/2>] ]>
​
​​iex>​ Enum.to_list odds
​[5, 17]

Because streams are enumerable, you can also pass a stream to a stream function. Because of this, we say that streams are composable.

 

Streams

[1,2,3,4]
|> Stream.map(&(&1*&1))
|> Stream.map(&(&1+1))
|> Stream.filter(fn x -> rem(x,2) == 1 end)
|> Enum.to_list

In real life we would write this:

We’re never creating intermediate lists—we’re just passing successive elements of each of the collections to the next in the chain. 

Streams

IO.puts File.open!("/usr/share/dict/words")
        |> IO.stream(:line)
        |> Enum.max_by(&String.length/1)

# there's a shortcut
IO.puts File.stream!("/usr/share/dict/words") |> Enum.max_by(&String.length/1)

Streams aren’t only for lists. More and more Elixir modules now support streams. Here’s our longest-word code written using streams:

The magic here is the call to IO.stream, which converts an IO device (in this case the open file) into a stream that serves one line at a time.

The good news is that there is no intermediate storage. The bad news is that it runs about two times slower than the previous version.

Infinite Streams

​​iex>​ Enum.map(1..10_000_000, &(&1+1)) |> Enum.take(5)
​[2, 3, 4, 5, 6]
# takes about 8 seconds

​​iex>​ Stream.map(1..10_000_000, &(&1+1)) |> Enum.take(5)
​[2, 3, 4, 5, 6]
# the result comes back instantaneously

Because streams are lazy, there’s no need for the whole collection to be available up front.

Creating Your Own Streams

Streams are implemented solely in Elixir libraries—there is no specific runtime support.
The actual implementation is complex so you probably want to use some helpful wrapper functions to do the heavy lifting. There are a number of these, including

  • cycle
  • repeatedly
  • iterate
  • unfold 
  • resource

 

Stream.cycle

Stream.cycle takes an enumerable and returns an infinite stream containing that enumerable’s elements. When it gets to the end, it repeats from the beginning, indefinitely. 

iex>​ Stream.cycle(​~​w{ green white }) |>
​​...>​ Stream.zip(1..5) |>
​​...>​ Enum.map(​fn​ {class, value} ->
​​...>​      ​~​s{<tr class=​"​​#{​class​}​​"​​>​<td>​#{value}</td></tr>\n} end) |>​
​​...>​ IO.puts
​<tr class="green"><td>1</td></tr>
​<tr class="white"><td>2</td></tr>
​<tr class="green"><td>3</td></tr>
​<tr class="white"><td>4</td></tr>
​<tr class="green"><td>5</td></tr>

Stream.repeatedly

Stream.repeatedly takes a function and invokes it each time a new value is wanted.

​iex>​ Stream.repeatedly(​fn​ -> true ​end​) |> Enum.take(3)
[true, true, true]
​iex>​ Stream.repeatedly(&​:random​.uniform/0) |> Enum.take(3)
[0.7230402056221108, 0.94581636451987, 0.5014907142064751]

Stream.iterate

Stream.iterate(start_value, next_fun) generates an infinite stream. The first value is start_value. The next value is generated by applying next_fun to this value. This continues for as long as the stream is being used, with each value being the result of applying next_fun to the previous value.

 

​iex>​ Stream.iterate(0, &(&1+1)) |> Enum.take(5)
​[0, 1, 2, 3, 4]
​iex>​ Stream.iterate(2, &(&1*&1)) |> Enum.take(5)
​[2, 4, 16, 256, 65536]
​iex>​ Stream.iterate([], &[&1]) |> Enum.take(5)
​[[], [[]], [[[]]], [[[[]]]], [[[[[]]]]]]

Stream.unfold

Stream.unfold is related to iterate, but you can be more explicit both about the values output to the stream and about the values passed to the next iteration.

Its general form is

​fn state -> { stream_value, new_state } end
​​iex>​ Stream.unfold({0,1}, ​fn​ {f1,f2} -> {f1, {f2, f1+f2}} ​end​) 
     |> Enum.take(15)
​[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

Stream.unfold

The function uses the argument to create two values, returned as a tuple.
The first is the value to be returned by this iteration of the stream, and the second is the value to be passed to the function on the next iteration of the stream

 

​​iex>​ Stream.unfold({0,1}, ​fn​ {f1,f2} -> {f1, {f2, f1+f2}} ​end​) 
     |> Enum.take(15)
​[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

Stream.unfold (cont.)

Here the state is a tuple containing the current and the next number in the sequence. We seed it with the initial state of {0, 1}. The value each iteration of the stream returns is the first of the state values. The new state moves one down the sequence, so an initial state of {f1,f2} becomes a new state of {f2,f1+f2}.

​​iex>​ Stream.unfold({0,1}, ​fn​ {f1,f2} -> {f1, {f2, f1+f2}} ​end​) 
     |> Enum.take(15)
​[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

Stream.resource

How would you implement yourself something like File.stream? We use Stream.resource

Stream.resource(fn -> File.open!("sample") end,
​                fn file ->
​                  case IO.read(file, :line) do
​                    data when is_binary(data) -> {[data], file}
​                    _ -> {:halt, file}
​                  end
​                end,
​                fn file -> File.close(file) end)

There are 3 function params:

  1. Open the resource
  2. Read from the resource
  3. Close/Cleanup the resource

Stream.resource

Let's create a timer.

defmodule Countdown do
  def sleep(seconds) do
    receive do
      after seconds*1000 -> nil
    end
  end
  def say(text) do
    spawn fn -> :os.cmd('say #{text}') end
  end
  def timer do
    Stream.resource(
      fn ->          # the number of seconds to the start of the next minute
         {_h,_m,s} = :erlang.time
         60 - s - 1
      end,

      fn             # wait for the next second, then return its countdown
        0 -> 
          {:halt, 0}

        count ->
          sleep(1)
          { [inspect(count)], count - 1 }
      end,

      fn _ ->  end   # nothing to deallocate
    )
  end
end

Stream.resource

$ iex countdown.exs
​​iex>​ counter = Countdown.timer
​#Function<17.133702391 in Stream.resource/3>
​​iex>​ printer = counter |> Stream.each(&IO.puts/1)
​#Stream[enum: #Function<17.133702391 in Stream.resource/3>,
​ funs: [#Function<0.133702391 in Stream.each/2>] ]>
​​iex>​ speaker = printer |> Stream.each(&Countdown.say/1)
​#Stream[enum: #Function<17.133702391 in Stream.resource/3>,
​ funs: [#Function<0.133702391 in Stream.each/2>,
​  #Function<0.133702391 in Stream.each/2>] ]>

iex>​ speaker |> Enum.take(5)
​37     ​**​ numbers are output once
​36     ​**​ per second. Even cooler,the
​35     ​**​ computer says
​34     ​**​ "thirty seven", "thirty six"…
​33
​["37", "36", "35", "34", "33"]

# wait a few seconds
​iex>​ speaker |> Enum.take(3)
​29
​28
​27
​["29", "28", "27"]

Streams in Practice

In the same way that functional programming requires you to look at problems in a new way, streams ask you to look at iteration and collections afresh.
Consider using a stream when you want to defer processing until you need the data,
and when you need to deal with large numbers of things without necessarily generating them all at once.

The Collectable Protocol

The Enumerable protocol lets you iterate over the elements collection.
Collectable is in some sense the opposite—it allows you to build a collection by inserting elements into it.

The collectable API is pretty low-level, so you’ll typically access it via Enum.into.

iex>​ Enum.into 1..5, []
​[1, 2, 3, 4, 5]”

​iex>​ Enum.into 1..5, [100, 101 ]
​[100, 101, 1, 2, 3, 4, 5]

# output streams are collectable
​iex>​ Enum.into IO.stream(​:stdio​, ​:line​), IO.stream(​:stdio​, ​:line​)

Comprehensions

When you’re writing functional code, you often map and filter collections of things.  Elixir provides a general-purpose shortcut for this: the comprehension.

 

The idea of a comprehension is fairly simple:

  • given one or more collections
  • extract all combinations of values from each
  • optionally filter the values
  • then generate a new collection using the values that remain

Comprehensions

The general syntax for comprehensions is deceptively simple:

result = for generator or filter… [, into: value ], do: expression
iex>​ for x <- [ 1, 2, 3, 4, 5 ], ​do​: x * x
​[1, 4, 9, 16, 25]
​​iex>​ for x <- [ 1, 2, 3, 4, 5 ], x < 4, ​do​: x * x
​[1, 4, 9]

A generator specifies how you want to extract values from a collection.

pattern <- enumerable_thing

Any variables matched in the pattern are available in the rest of the comprehension.

Comprehensions

For example, x <- [1,2,3] says that we want to first run the rest of the comprehension with x set to 1.
Then we run it with x set to 2, and so on.

 

 

 

 

If we have two generators, their operations are nested, so 

x <- [1,2], y <- [5,6]

will run the rest of the comprehension with
x=1, y=5; x=1, y=6; x=2, y=5; and x=2, y=6.

​​iex>​ for x <- [1,2], y <- [5,6], ​do​:  x * y
​[5, 6, 10, 12]
​​iex>​ for x <- [1,2], y <- [5,6], ​do​:  {x,  y}
​[{1, 5}, {1, 6}, {2, 5}, {2, 6}]

Comprehensions

You can use variables from generators in later generators:

​​iex>​ min_maxes = [{1,4}, {2,3}, {10, 15}]
​[{1, 4}, {2, 3}, {10, 15}]
​​iex>​ for {min,max} <- min_maxes, n <- min..max, ​do​: n
​[1, 2, 3, 4, 2, 3, 10, 11, 12, 13, 14, 15]

n <- min..max is a filter.

A filter acts as a gatekeeper for the rest of the comprehension—if the condition is false, then the comprehension moves on to the next iteration without generating an output value.

Comprehensions

This code uses a comprehension to list pairs of numbers from 1 to 8 whose product is a multiple of 10. 

​​iex>​ first8 = [ 1,2,3,4,5,6,7,8 ]
​[1, 2, 3, 4, 5, 6, 7, 8]
​​iex>​ for x <- first8, y <- first8, x >= y, rem(x*y, 10)==0, ​do​: { x, y }
​[{5, 2}, {5, 4}, {6, 5}, {8, 5}]

It uses two generators (to cycle through the pairs of numbers) and two filters. 
The first filter x >= y allows only pairs in which the first number is at least the value of the second. 
The second filter x >= y checks to see if the product is a multiple of 10.

Comprehensions

​​iex>​ first8 = [ 1,2,3,4,5,6,7,8 ]
​[1, 2, 3, 4, 5, 6, 7, 8]
​​iex>​ for x <- first8, y <- first8, x >= y, rem(x*y, 10)==0, ​do​: { x, y }
​[{5, 2}, {5, 4}, {6, 5}, {8, 5}]

This comprehension iterates 64 times,
with x=1, y=1; x=1, y=2; and so on.

 

However, the first filter cuts the iteration short
when x is less than y.

 

This means the second filter runs only 36 times.

Comprehensions

iex>​ reports = [ ​dallas:​ ​:hot​, ​minneapolis:​ ​:cold​, ​dc:​ ​:muggy​, ​la:​ ​:smoggy​ ]
​[dallas: :hot, minneapolis: :cold, dc: :muggy, la: :smoggy]
​
iex>​ for { city, weather } <- reports, ​do​: { weather, city }
​[hot: :dallas, cold: :minneapolis, muggy: :dc, smoggy: :la]

Because the first term in a generator is a pattern, we can use it to deconstruct structured data. 

Here’s a comprehension that swaps the keys and values in a keyword list.

Bit Comprehensions

# generator is enclosed in << >>
​iex>​ for << ch <- ​"​​hello"​ >>, ​do​: ch
​'hello'

# convert back into a string
​​iex>​ for << ch <- ​"​​hello"​ >>, ​do​: <<ch>>
​["h", "e", "l", "l", "o"]

# convert to octal
​​iex>​ for << << b1::size(2), b2::size(3), b3::size(3) >> <- ​"​​hello"​ >>,
​​...>​ ​do​: ​"​​0​​#{​b1​}#{​b2​}#{​b3​}​​"​
​["0150", "0145", "0154", "0154", "0157"]

A bitstring (and, by extension, a binary or a string) is simply a collection of ones and zeroes and allows comprehensions.

Scoping and Comprehensions

​iex>​ name = ​"​​Dave"​
​"Dave"
​​iex>​ for name <- [ ​"​​cat"​, ​"​​dog"​ ], ​do​: String.upcase(name)
​["CAT", "DOG"]
​​iex>​ name
​"Dave"
​​iex>​

All variable assignments inside a comprehension are local to that comprehension—you will not affect the value of a variable in the outer scope.

The Value Returned by a Comprehension

​iex>​ for x <- ​~​w{ cat dog }, ​into:​ %{}, ​do​: { x, String.upcase(x) }
​%{"cat" => "CAT", "dog" => "DOG"}

# use Map.new
​​iex>​ for x <- ​~​w{ cat dog }, ​into:​ Map.new, ​do​: { x, String.upcase(x) }
​%{"cat" => "CAT", "dog" => "DOG"}

# non empty collection
​​iex>​ for x <- ​~​w{ cat dog }, ​into:​ %{​"​​ant"​ => ​"​​ANT"​}, ​do​: { x, String.upcase(x) }
​%{"ant" => "ANT", "cat" => "CAT", "dog" => "DOG"}

# Collectable
​iex>​ for x <- ​~​w{ cat dog }, ​into:​ IO.stream(​:stdio​,​:line​), ​do​: ​"​​<<​​#{​x​}​​>>\n"​
​<<cat>>
​<<dog>>
​%IO.Stream{device: :standard_io, line_or_bytes: :line, raw: false}”

So far, the comprehension has returned a list using the do expression.
This can be changed with the into: parameter.
This takes a collection that is to receive the results of the comprehension

Thank you!

Programming Elixir 1.6 Chapter 10

By Dustin McCraw

Programming Elixir 1.6 Chapter 10

  • 1,279