The Elixir way says not to worry much about code that crashes; instead, make sure the overall application keeps running.
Imagine your application consists of hundreds or thousands of processes, each handling just a small part of a request. If one of those crashes, everything else carries on.
In the Elixir and OTP worlds, supervisors perform all of this process monitoring and restarting.
An Elixir supervisor has just one purpose—it manages one or more worker processes.
At its simplest, a supervisor is a process that uses the OTP supervisor behavior. It is given a list of processes to monitor and is told what to do if a process dies, and how to prevent restart loops.
You can write supervisors as separate modules,
but the Elixir style is to include them inline.
The easiest way to get started is to create your project with the --sup flag.
Excerpt From: Dave Thomas. “Programming Elixir 1.2 (for Shane Emmons).” iBooks.
$ mix new --sup sequence
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/sequence.ex
* creating test
* creating test/test_helper.exs
* creating test/sequence_test.exs
sequence/lib/sequence.ex
defmodule Sequence.Application do
@moduledoc false
use Application
def start(_type, _args) do
children = [
{ Sequence.Server, 123 },
]
opts = [strategy: :one_for_one, name: Sequence.Supervisor]
Supervisor.start_link(children, opts)
end
end
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
end
$ iex -S mix
Compiled lib/sequence.ex
Compiled lib/sequence/server.ex
Generated sequence app
iex> Sequence.Server.increment_number 3
:ok
iex> Sequence.Server.next_number
126
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def format_status(_reason, [ _pdict, state ]) do
[data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]]
end
end
# try to increment a non-number
iex(3)> Sequence.Server.increment_number "cat"
:ok
iex(4)> 14:22:06.269 [error] GenServer Sequence.Server terminating
Last message: {:"$gen_cast", {:increment_number, "cat"}}
State: [data: [{'State', "My current state is '127', and I'm happy"}]]
** (exit) an exception was raised:
** (ArithmeticError) bad argument in arithmetic expression
(sequence) lib/sequence/server.ex:27: Sequence.Server.handle_cast/2
(stdlib) gen_server.erl:599: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:239: :proc_lib.init_p_do_apply/3
# yay, it brought it back up, but it's back to the default value
iex(4)> Sequence.Server.next_number
123
iex(5)> Sequence.Server.next_number
124
defmodule Sequence.Server do
use GenServer
#####
# External API
def start_link(current_number) do
GenServer.start_link(__MODULE__, current_number, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
#####
# GenServer implementation
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def format_status(_reason, [ _pdict, state ]) do
[data: [{'State', "My current state is '#{inspect state}', and I'm happy"}]]
end
end
Our server is not stateless—it needs to remember the current number.
We’ll write a separate worker process that can
store and retrieve a value. We’ll call it our stash.
Our sequence server should be fairly robust,
but we’ve already found one thing that crashes it.
But our stash process must be more robust—
it must outlive the sequence server.
We have to supervise our stash separately so we will create a supervision tree.
Let's create a server to stash our number.
defmodule Sequence.Stash do
use GenServer
@me __MODULE__
# External API
def start_link(initial_number) do
GenServer.start_link(__MODULE__, initial_number, name: @me)
end
def get() do
GenServer.call(@me, { :get })
end
def update(new_number) do
GenServer.cast(@me, { :update, new_number })
end
# GenServer implementation
def init(initial_number) do
{ :ok, initial_number }
end
def handle_call({ :get }, _from, current_number ) do
{ :reply, current_number, current_number }
end
def handle_cast({ :update, new_number }, _current_number) do
{ :noreply, new_number }
end
end
Now that we have two servers, we need to supervise them.
What happens if one of them crashes?
We have to determine a supervision strategy
if a server dies, the servers that follow it in the list of children and terminated, and then the dying server and those that were terminated are restarted.
defmodule Sequence.Application do
@moduledoc false
use Application
def start(_type, _args) do
children = [
{ Sequence.Stash, 123},
{ Sequence.Server, nil},
]
opts = [strategy: :rest_for_one, name: Sequence.Supervisor]
Supervisor.start_link(children, opts)
end
end
defmodule Sequence.Server do
use GenServer
# External API
def start_link(_) do
GenServer.start_link(__MODULE__, nil, name: __MODULE__)
end
def next_number do
GenServer.call __MODULE__, :next_number
end
def increment_number(delta) do
GenServer.cast __MODULE__, {:increment_number, delta}
end
# GenServer implementation
def init(_) do
{ :ok, Sequence.Stash.get() }
end
def handle_call(:next_number, _from, current_number) do
{ :reply, current_number, current_number+1 }
end
def handle_cast({:increment_number, delta}, current_number) do
{ :noreply, current_number + delta}
end
def terminate(_reason, current_number) do
Sequence.Stash.update(current_number)
end
end
$ iex -S mix
iex> Sequence.Server.next_number
123
iex> Sequence.Server.next_number
124
iex> Sequence.Server.next_number
125
iex> Sequence.Server.increment_number "cat"
:ok
iex>
12:15:48.424 [error] GenServer Sequence.Server terminating
** (ArithmeticError) bad argument in arithmetic expression
(sequence) lib/sequence/server.ex:39: Sequence.Server.handle_cast/2
Last message: {:"$gen_cast", {:increment_number, "cat"}}
State: 126
iex> Sequence.Server.next_number
126
iex> Sequence.Server.next_number
127
The server code crashed, but was then restarted automatically. And, in the process, the state was stored away in the stash and then recovered—the sequence continued uninterrupted.
This example was profound because it is a concrete representation of the idea of building rings of confidence in our code. The outer ring, where our code interacts with the world, should be as reliable as we can make it. But within that ring there are other, nested rings. And in those rings, things can be less than perfect. The trick is to ensure that the code in each ring knows how to deal with failures of the code in the next ring down.
And that’s where supervisors come into play.
But the real power of supervisors is that they exist. The fact that you use them to manage your workers means you are forced to think about reliability and state.