Elixir 1.6

Chapter 19 

OTP: Supervisors

Supervisors

The Elixir way says not to worry much about code that crashes; instead, make sure the overall application keeps running.

Imagine your application consists of hundreds or thousands of processes, each handling just a small part of a request. If one of those crashes, everything else carries on.

In the Elixir and OTP worlds, supervisors perform all of this process monitoring and restarting.

Supervisors and Workers

An Elixir supervisor has just one purpose—it manages one or more worker processes.

 

At its simplest, a supervisor is a process that uses the OTP supervisor behavior. It is given a list of processes to monitor and is told what to do if a process dies, and how to prevent restart loops.

 

You can write supervisors as separate modules,
but the Elixir style is to include them inline. 

Creating a Supervisor

The easiest way to get started is to create your project with the --sup flag.

Excerpt From: Dave Thomas. “Programming Elixir 1.2 (for Shane Emmons).” iBooks. 

$ ​​mix​​ ​​new​​ ​​--sup​​ ​​sequence​
​* creating README.md
​* creating .gitignore
​* creating mix.exs
​* creating config
​* creating config/config.exs
​* creating lib
​* creating lib/sequence.ex
​* creating test
​* creating test/test_helper.exs
​* creating test/sequence_test.exs

Sequence Supervisor

sequence/lib/sequence.ex

defmodule Sequence.Application do
  @moduledoc false

  use Application

  def start(_type, _args) do
    children = [
      { Sequence.Server, 123 },
    ]

    opts = [strategy: :one_for_one, name: Sequence.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Sequence.Server

defmodule Sequence.Server do
  use GenServer

  #####
  # External API  
  def start_link(current_number) do
    GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
  end

  def next_number do
    GenServer.call __MODULE__, :next_number
  end

  def increment_number(delta) do
    GenServer.cast __MODULE__, {:increment_number, delta}
  end

  #####
  # GenServer implementation
  def handle_call(:next_number, _from, current_number) do
    { :reply, current_number, current_number+1 }
  end

  def handle_cast({:increment_number, delta}, current_number) do
    { :noreply, current_number + delta}
  end
end

Sequence Supervisor

  • When our application starts, the start function is called.
  • It creates a list of child servers. In our case, we want to start Sequence.Server and pass it the parameter 123.
  • We call Supervisor.start_link, passing it the list of child specifications and a set of options. This creates a supervisor process.
  • Now our supervisor process calls the start_link function for each of its managed children. In our case, this is the function in Sequence.Server. This code is unchanged—it calls GenServer.start_link to create a GenServer process.

Sequence Supervisor

$ iex -S mix
​Compiled lib/sequence.ex
​Compiled lib/sequence/server.ex
​Generated sequence app
​​iex>​ Sequence.Server.increment_number 3
​:ok
​​iex>​ Sequence.Server.next_number
​126
defmodule Sequence.Server do
  use GenServer

  #####
  # External API  
  def start_link(current_number) do
    GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
  end

  def next_number do
    GenServer.call __MODULE__, :next_number
  end

  def increment_number(delta) do
    GenServer.cast __MODULE__, {:increment_number, delta}
  end

  #####
  # GenServer implementation
  def handle_call(:next_number, _from, current_number) do
    { :reply, current_number, current_number+1 }
  end

  def handle_cast({:increment_number, delta}, current_number) do
    { :noreply, current_number + delta}
  end

  def format_status(_reason, [ _pdict, state ]) do
    [data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]] 
  end
end

Sequence Supervisor

# try to increment a non-number
iex(3)>​ Sequence.Server.increment_number ​"​​cat"​
​:ok
​iex(4)>​ 14:22:06.269 [error] GenServer Sequence.Server terminating
​Last message: {:"$gen_cast", {:increment_number, "cat"}}
​State: [data: [{'State', "My current state is '127', and I'm happy"}]]
​**​ (exit) an exception was raised:
​    ​**​ (ArithmeticError) bad argument in arithmetic expression
​        (sequence) lib/sequence/server.ex:27: Sequence.Server.handle_cast/2
​        (stdlib) gen_server.erl:599: :gen_server.handle_msg/5
​        (stdlib) proc_lib.erl:239: :proc_lib.init_p_do_apply/3

# yay, it brought it back up, but it's back to the default value
​iex(4)>​ Sequence.Server.next_number
​123
​iex(5)>​ Sequence.Server.next_number
​124
defmodule Sequence.Server do
  use GenServer

  #####
  # External API  
  def start_link(current_number) do
    GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
  end

  def next_number do
    GenServer.call __MODULE__, :next_number
  end

  def increment_number(delta) do
    GenServer.cast __MODULE__, {:increment_number, delta}
  end

  #####
  # GenServer implementation
  def handle_call(:next_number, _from, current_number) do
    { :reply, current_number, current_number+1 }
  end

  def handle_cast({:increment_number, delta}, current_number) do
    { :noreply, current_number + delta}
  end

  def format_status(_reason, [ _pdict, state ]) do
    [data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]] 
  end
end

Managing Process State Across Restarts

Our server is not stateless—it needs to remember the current number.

We’ll write a separate worker process that can
store and retrieve a value. We’ll call it our stash.

Our sequence server should be fairly robust,
but we’ve already found one thing that crashes it.

But our stash process must be more robust—
it must outlive the sequence server.
We have to supervise our stash separately so we will create a supervision tree.

Let's create a server to stash our number.

Stash Worker

defmodule Sequence.Stash do
  use GenServer
  @me __MODULE__
  
  # External API  
  def start_link(initial_number) do
    GenServer.start_link(__MODULE__, initial_number, name: @me)
  end

  def get() do
    GenServer.call(@me, { :get })
  end
  
  def update(new_number) do
    GenServer.cast(@me, { :update, new_number })
  end

  # GenServer implementation
  def init(initial_number) do
    { :ok, initial_number }
  end

  def handle_call({ :get }, _from, current_number ) do
    { :reply, current_number, current_number }
  end

  def handle_cast({ :update, new_number }, _current_number) do
    { :noreply, new_number }
  end
end

Managing Process State Across Restarts

Now that we have two servers, we need to supervise them.

Supervision Strategies

What happens if one of them crashes?
We have to determine a supervision strategy

  • :one_for_one
    if a server dies, restart it (default)
  • :one_for_all
    if a server dies, terminate all servers
    and restart them
  • :rest_for_one

    if a server dies, the servers that follow it in the list of children and terminated, and then the dying server and those that were terminated are restarted.

Application

defmodule Sequence.Application do
  @moduledoc false

  use Application

  def start(_type, _args) do
    children = [
      { Sequence.Stash,  123},
      { Sequence.Server, nil},
    ]

    opts = [strategy: :rest_for_one, name: Sequence.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Sequence Worker

defmodule Sequence.Server do
  use GenServer
  # External API  
  def start_link(_) do
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  end

  def next_number do
    GenServer.call __MODULE__, :next_number
  end

  def increment_number(delta) do
    GenServer.cast __MODULE__, {:increment_number, delta}
  end

  # GenServer implementation
  def init(_) do
    { :ok, Sequence.Stash.get() }
  end
  
  def handle_call(:next_number, _from, current_number) do
    { :reply, current_number, current_number+1 }
  end

  def handle_cast({:increment_number, delta}, current_number) do
    { :noreply, current_number + delta}
  end

  def terminate(_reason, current_number) do
    Sequence.Stash.update(current_number)
  end
end

Fire it up!

$ iex -S mix
​​iex>​ Sequence.Server.next_number
​123
​​iex>​ Sequence.Server.next_number
​124
​​iex>​ Sequence.Server.next_number
​125
​​iex>​ Sequence.Server.increment_number ​"​​cat"​
​:ok
​​iex>​
​12:15:48.424 [error] GenServer Sequence.Server terminating
​​**​ (ArithmeticError) bad argument in arithmetic expression
​   (sequence) lib/sequence/server.ex:39: Sequence.Server.handle_cast/2
​Last message: {:"$gen_cast", {:increment_number, "cat"}}
​State: 126
​iex>​ Sequence.Server.next_number
​126
​​iex>​ Sequence.Server.next_number
​127
​​

The server code crashed, but was then restarted automatically. And, in the process, the state was stored away in the stash and then recovered—the sequence continued uninterrupted.

Supervisors Are the Heart of Reliability

This example was profound because it is a concrete representation of the idea of building rings of confidence in our code. The outer ring, where our code interacts with the world, should be as reliable as we can make it. But within that ring there are other, nested rings. And in those rings, things can be less than perfect. The trick is to ensure that the code in each ring knows how to deal with failures of the code in the next ring down.

And that’s where supervisors come into play.
But the real power of supervisors is that they exist. The fact that you use them to manage your workers means you are forced to think about reliability and state.

Thank you!

Programming Elixir 1.6 Chapter 19

By Dustin McCraw

Programming Elixir 1.6 Chapter 19

  • 1,096