Elixir 1.6
Chapter 19
OTP: Supervisors
Supervisors
The Elixir way says not to worry much about code that crashes; instead, make sure the overall application keeps running.
Imagine your application consists of hundreds or thousands of processes, each handling just a small part of a request. If one of those crashes, everything else carries on.
In the Elixir and OTP worlds, supervisors perform all of this process monitoring and restarting.
Supervisors and Workers
An Elixir supervisor has just one purpose—it manages one or more worker processes.
At its simplest, a supervisor is a process that uses the OTP supervisor behavior. It is given a list of processes to monitor and is told what to do if a process dies, and how to prevent restart loops.
You can write supervisors as separate modules,
but the Elixir style is to include them inline.
Creating a Supervisor
The easiest way to get started is to create your project with the --sup flag.
Excerpt From: Dave Thomas. “Programming Elixir 1.2 (for Shane Emmons).” iBooks.
$ mix new --sup sequence
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/sequence.ex
* creating test
* creating test/test_helper.exs
* creating test/sequence_test.exs
Sequence Supervisor
sequence/lib/sequence.ex
defmodule Sequence.Application do
@moduledoc false
use Application
def start(_type, _args) do
children = [
{ Sequence.Server, 123 },
]
opts = [strategy: :one_for_one, name: Sequence.Supervisor]
Supervisor.start_link(children, opts)
end
end
Sequence.Server
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
end
Sequence Supervisor
- When our application starts, the start function is called.
- It creates a list of child servers. In our case, we want to start Sequence.Server and pass it the parameter 123.
- We call Supervisor.start_link, passing it the list of child specifications and a set of options. This creates a supervisor process.
- Now our supervisor process calls the start_link function for each of its managed children. In our case, this is the function in Sequence.Server. This code is unchanged—it calls GenServer.start_link to create a GenServer process.
Sequence Supervisor
$ iex -S mix
Compiled lib/sequence.ex
Compiled lib/sequence/server.ex
Generated sequence app
iex> Sequence.Server.increment_number 3
:ok
iex> Sequence.Server.next_number
126
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def format_status(_reason, [ _pdict, state ]) do
[data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]]
end
end
Sequence Supervisor
# try to increment a non-number
iex(3)> Sequence.Server.increment_number "cat"
:ok
iex(4)> 14:22:06.269 [error] GenServer Sequence.Server terminating
Last message: {:"$gen_cast", {:increment_number, "cat"}}
State: [data: [{'State', "My current state is '127', and I'm happy"}]]
** (exit) an exception was raised:
** (ArithmeticError) bad argument in arithmetic expression
(sequence) lib/sequence/server.ex:27: Sequence.Server.handle_cast/2
(stdlib) gen_server.erl:599: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:239: :proc_lib.init_p_do_apply/3
# yay, it brought it back up, but it's back to the default value
iex(4)> Sequence.Server.next_number
123
iex(5)> Sequence.Server.next_number
124
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def format_status(_reason, [ _pdict, state ]) do
[data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]]
end
end
Managing Process State Across Restarts
Our server is not stateless—it needs to remember the current number.
We’ll write a separate worker process that can
store and retrieve a value. We’ll call it our stash.
Our sequence server should be fairly robust,
but we’ve already found one thing that crashes it.
But our stash process must be more robust—
it must outlive the sequence server.
We have to supervise our stash separately so we will create a supervision tree.
Let's create a server to stash our number.
Stash Worker
defmodule Sequence.Stash do
use GenServer
@me __MODULE__
# External API
def start_link(initial_number) do
GenServer.start_link(__MODULE__, initial_number, name: @me)
end
def get() do
GenServer.call(@me, { :get })
end
def update(new_number) do
GenServer.cast(@me, { :update, new_number })
end
# GenServer implementation
def init(initial_number) do
{ :ok, initial_number }
end
def handle_call({ :get }, _from, current_number ) do
{ :reply, current_number, current_number }
end
def handle_cast({ :update, new_number }, _current_number) do
{ :noreply, new_number }
end
end
Managing Process State Across Restarts
Now that we have two servers, we need to supervise them.
Supervision Strategies
What happens if one of them crashes?
We have to determine a supervision strategy
-
:one_for_one
if a server dies, restart it (default) -
:one_for_all
if a server dies, terminate all servers
and restart them -
:rest_for_one
if a server dies, the servers that follow it in the list of children and terminated, and then the dying server and those that were terminated are restarted.
Application
defmodule Sequence.Application do
@moduledoc false
use Application
def start(_type, _args) do
children = [
{ Sequence.Stash, 123},
{ Sequence.Server, nil},
]
opts = [strategy: :rest_for_one, name: Sequence.Supervisor]
Supervisor.start_link(children, opts)
end
end
Sequence Worker
defmodule Sequence.Server do
use GenServer
# External API
def start_link(_) do
GenServer.start_link(__MODULE__, nil, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
# GenServer implementation
def init(_) do
{ :ok, Sequence.Stash.get() }
end
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def terminate(_reason, current_number) do
Sequence.Stash.update(current_number)
end
end
Fire it up!
$ iex -S mix
iex> Sequence.Server.next_number
123
iex> Sequence.Server.next_number
124
iex> Sequence.Server.next_number
125
iex> Sequence.Server.increment_number "cat"
:ok
iex>
12:15:48.424 [error] GenServer Sequence.Server terminating
** (ArithmeticError) bad argument in arithmetic expression
(sequence) lib/sequence/server.ex:39: Sequence.Server.handle_cast/2
Last message: {:"$gen_cast", {:increment_number, "cat"}}
State: 126
iex> Sequence.Server.next_number
126
iex> Sequence.Server.next_number
127
The server code crashed, but was then restarted automatically. And, in the process, the state was stored away in the stash and then recovered—the sequence continued uninterrupted.
Supervisors Are the Heart of Reliability
This example was profound because it is a concrete representation of the idea of building rings of confidence in our code. The outer ring, where our code interacts with the world, should be as reliable as we can make it. But within that ring there are other, nested rings. And in those rings, things can be less than perfect. The trick is to ensure that the code in each ring knows how to deal with failures of the code in the next ring down.
And that’s where supervisors come into play.
But the real power of supervisors is that they exist. The fact that you use them to manage your workers means you are forced to think about reliability and state.
Thank you!
Programming Elixir 1.6 Chapter 19
By Dustin McCraw
Programming Elixir 1.6 Chapter 19
- 1,096