Keeping tabs on production

Hubert Łępicki
CODE BEAM LITE AMSTERDAM

28 November 2019

@hubertlepicki

https://slides.com/

hubertlepicki

Today I will focus on

  • Debugging production issues
  • Monitoring
  • Bunch of semi-random tips and tricks
  • From Elixir perspective but heavy on Erlang tools

Most important thing when running a business?

Problem

$ mix test
Randomized with seed 145268
==> myapp
.........................................................
.........................................................
.........................................................
................................                                                                                                                                                           

Finished in 20.6 seconds
30 doctests, 165 tests, 0 failures

Bugs are often difficult to reproduce

What happens on production?

Reconstruct user journey

Ask for more information

  • Who is the issue happening to?
  • What is the URL the issue happens at?
  • Can we have screenshot with the issue visible?

We check the logs

Logger

Logger in Elixir & Phoenix

config :logger,
  level: :info,
  handle_sasl_reports: true,
  backends: [:console]

config :logger, :console,
  format: "$time $metadata[$level] $message\n",
  metadata: [:request_id]

Logger.debug("I am debugging info")
Logger.info("I am info-level message! Probably ok.")
Logger.warn("I am a warning! Beware!")
Logger.error("Oh no, this is not good :(")

Logger backends

def deps do
  [{:logger_papertrail_backend, "~> 1.1"}]
end

config :logger, :logger_papertrail_backend,
  url: "papertrail://logs.papertrailapp.com:<port>/<system_name>",
  level: :info,
  format: "$metadata $message"
  

config :logger,
  backends: [ :console,
    LoggerPapertrailBackend.Logger
  ]

Inspecting the logs

07:00:51.577 request_id=Fds-K5OR1daI51IAAAIx [info] Sent 200 in 1ms                                                                                                            
07:00:51.969 request_id=Fds-K6r-mtJMK_gAAAJB [info] GET /page/                                                                                                                 
07:00:51.970 request_id=Fds-K6r-mtJMK_gAAAJB [info] Sent 200 in 1ms                                                                                                            
07:00:52.324 request_id=Fds-K8AszvRYWgIAAAJR [info] GET /page/                                                                                                                 
07:00:52.350 [error] GenServer Myapp.CurrentUsageCalculator terminating
** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
    (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", :increment}
State: %{usage: 10}
07:00:52.353 request_id=Fds-K8AszvRYWgIAAAJR [info] Converted exit {{{:badmatch, {:error, :limits_exceeded}}, [{Myapp.CurrentUsageCalculator, :handle_cast, 2, [file: 'lib/mya$
p/current_usage_calculator.ex', line: 26]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 
711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}, {GenServer, :call, [Myapp.CurrentUsageCalculator, :get_usage, 5000]}} to 500 response
07:00:52.360 [error] #PID<0.565.0> running MyappWeb.Endpoint (connection #PID<0.454.0>, stream id 50) terminated
Server: localhost:4000 (http)
Request: GET /page/
** (exit) exited in: GenServer.call(Myapp.CurrentUsageCalculator, :get_usage, 5000)
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
            (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
            (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
            (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
07:00:52.636 request_id=Fds-K9K-ZBZTynwAAAJh [info] GET /page/
07:00:52.636 request_id=Fds-K9K-ZBZTynwAAAJh [info] Converted exit {:noproc, {GenServer, :call, [Myapp.CurrentUsageCalculator, :get_usage, 5000]}} to 500 response
07:00:52.636 [error] #PID<0.567.0> running MyappWeb.Endpoint (connection #PID<0.454.0>, stream id 51) terminated
Server: localhost:4000 (http)
Request: GET /page/
** (exit) exited in: GenServer.call(Myapp.CurrentUsageCalculator, :get_usage, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
07:00:53.075 request_id=Fds-K-zvfZsVjAEAAAJx [info] GET /page/

Logs = messages + metadata

Use metadata

request_id=Fds-K6r-mtJMK_gAAAJB [info] GET /page/     
request_id=Fds-K5OR1daI51IAAAIx [info] Sent 200 in 1ms  
request_id=Fds-K6r-mtJMK_gAAAJB [info] Sent 200 in 1ms                                                                                                            
request_id=Fds-K8AszvRYWgIAAAJR [info] GET /page/                                                                                                                 
[error] GenServer Myapp.CurrentUsageCalculator terminating
** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
    (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", :increment}
State: %{usage: 10}
request_id=Fds-K8AszvRYWgIAAAJR [info] Converted exit {{{:badmatch, {:error, :limits_exceeded}}, [{Myapp.CurrentUsageCalculator, :handle_cast, 2, [file: 'lib/mya$
p/current_usage_calculator.ex', line: 26]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 
711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}, {GenServer, :call, [Myapp.CurrentUsageCalculator, :get_usage, 5000]}} to 500 response
07:00:52.360 [error] #PID<0.565.0> running MyappWeb.Endpoint (connection #PID<0.454.0>, stream id 50) terminated
Server: localhost:4000 (http)
Request: GET /page/
** (exit) exited in: GenServer.call(Myapp.CurrentUsageCalculator, :get_usage, 5000)
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
            (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
            (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
            (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Plug.RequestId

...

defp set_request_id({conn, request_id}, header) do
    Logger.metadata(request_id: request_id)
    Conn.put_resp_header(conn, header, request_id)
end

...


config :logger, :console,
  format: "$time $metadata[$level] $message\n",
  metadata: [:request_id],
  level: :info
     

Use metadata in own code

...

Logger.metadata(user_id: user_id)

...


config :logger, :console,
  format: "$time $metadata[$level] $message\n",
  metadata: [:request_id, user_id],
  level: :info
     
request_id=JB user_id=1 [info] GET /page/     
request_id=Ix user_id=2 [info] Sent 200 in 1ms  
request_id=JB user_id=1 [info] Sent 200 in 1ms                                                                                                            
request_id=JR user_id=1 [info] GET /page/                                                                                                                 
[error] GenServer Myapp.CurrentUsageCalculator terminating
** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
    (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", :increment}
State: %{usage: 10}
request_id=JR user_id=1 [info] Converted exit {{{:badmatch, {:error, :limits_exceeded}}, [{Myapp.CurrentUsageCalculator, :handle_cast, 2, [file: 'lib/mya$
p/current_usage_calculator.ex', line: 26]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 
711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}, {GenServer, :call, [Myapp.CurrentUsageCalculator, :get_usage, 5000]}} to 500 response
07:00:52.360 [error] #PID<0.565.0> running MyappWeb.Endpoint (connection #PID<0.454.0>, stream id 50) terminated
Server: localhost:4000 (http)
Request: GET /page/
** (exit) exited in: GenServer.call(Myapp.CurrentUsageCalculator, :get_usage, 5000)
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: {:error, :limits_exceeded}
            (myapp) lib/myapp/current_usage_calculator.ex:26: Myapp.CurrentUsageCalculator.handle_cast/2
            (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
            (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

I know Clojure

Important thing to know about Logger

  • async/sync modes
  • switches to sync when buffer is full
  • use own log statements but *don't abuse it*
  • use own metadata but *don't abuse it*
  • can track metrics (but you probably want to use :telemetry instead)
  • abusing logger will result in performance punishment to your production environment!

Future of Logger

  • rely on OTP 21+ :logger
  • backward-compatible with current Logger (for now)
  • allow structured logging
  • allow sharing metadata with Erlang logs
  • new log levels (:emergency, :alert, :critical, :warning, :notice)
  • handlers written in Erlang (or Elixir!)
  • built-in logging to files (with file rotation!)
  • smooth integration with Lager (erlang-lager/lager)

Inspecting logs is just first step

Poking around production

Remote console

./bin/myapp remote_console

(or)

gigalixir ps:remote_console --app_name=myapp

iex>

Debugging OTP behaviours

:sys module - :sys.log/2

iex> pid = Process.whereis(CurrentUsageCalculator)       
#PID<0.435.0>

iex> :sys.log(pid, {true, 100})
:ok

iex> CurrentUsageCalculator.increment()           
:ok

iex> CurrentUsageCalculator.increment()
:ok

iex> CurrentUsageCalculator.get_usage()
2

:sys module - :sys.log/2

iex> :sys.log pid, :get
{:ok,
 [
   {:in, {:"$gen_cast", :increment}},
   {:noreply, %{usage: 1}},
   {:in, {:"$gen_cast", :increment}},
   {:noreply, %{usage: 2}},
   {:in,
    {:"$gen_call", {#PID<0.436.0>, #Reference<0.4113248558.1777598466.76538>},
     :get_usage}},
   {:out, 2, {#PID<0.436.0>, #Reference<0.4113248558.1777598466.76538>},
    %{usage: 2}} 
 ]}
 
 

:sys module - :sys.log/2

iex> :sys.log pid, :print
*DBG* 'CurrentUsageCalculator' got cast increment
*DBG* 'CurrentUsageCalculator' new state #{usage => 1}
*DBG* 'CurrentUsageCalculator' got cast increment
*DBG* 'CurrentUsageCalculator' new state #{usage => 2}
*DBG* 'CurrentUsageCalculator' got call get_usage from 
<0.436.0>
*DBG* 'CurrentUsageCalculator' sent 2 to <0.436.0>, new
state #{usage =>
                                                                                 2}
:ok
 

:sys module - :sys.trace/2

iex> :sys.trace pid, true
:ok
iex> CurrentUsageCalculator.get_usage()
*DBG* 'CurrentUsageCalculator' got call get_usage from
<0.436.0>
*DBG* 'CurrentUsageCalculator' sent 2 to <0.436.0>, new
state #{usage => 2}
2
iex> CurrentUsageCalculator.increment()
*DBG* 'CurrentUsageCalculator' got cast increment
:ok
*DBG* 'CurrentUsageCalculator' new state #{usage => 3}

:sys module - inspecting state

iex> :sys.get_state(pid)
%{usage: 6}

iex> :sys.replace_state(pid, fn(_) -> %{usage: 0} end)
%{usage: 0}

iex> :sys.get_state(pid)                              
%{usage: 0}

:sys module - turn off

iex> :sys.no_debug(pid)
:ok

:erlang.trace/3 #yolo

iex(27)> :erlang.trace pid, true, [:receive]
1
iex(28)> Myapp.CurrentUsageCalculator.increment()         
:ok
iex(29)> flush
{:trace, #PID<0.435.0>, :receive, {:"$gen_cast", :increment}}
:ok

:erlang.trace/3 #yolo

iex(27)> :erlang.trace :all, true, [:all]
1
iex(28)> Myapp.CurrentUsageCalculator.increment()         
:ok
iex(29)> flush
{:trace_ts, #PID<0.65.0>, :send,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}},
 #PID<0.63.0>, 2, {1574, 928954, 598373}}
{:trace_ts, #PID<0.65.0>, :out, {:group, :server_loop, 3}, 2,
 {1574, 928954, 598374}}
{:trace_ts, #PID<0.63.0>, :in, {:user_drv, :server_loop, 6}, 2,
 {1574, 928954, 598377}}
{:trace_ts, #PID<0.63.0>, :receive,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}}, 2,
 {1574, 928954, 598378}}
{:trace_ts, #PID<0.63.0>, :out, {:user_drv, :server_loop, 6}

:erlang.trace/3 #yolo

{:trace_ts, #PID<0.65.0>, :send,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}},
 #PID<0.63.0>, 2, {1574, 928954, 598373}}
{:trace_ts, #PID<0.65.0>, :out, {:group, :server_loop, 3}, 2,
 {1574, 928954, 598374}}
{:trace_ts, #PID<0.63.0>, :in, {:user_drv, :server_loop, 6}, 2,
 {1574, 928954, 598377}}
{:trace_ts, #PID<0.63.0>, :receive,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}}, 2,
 {1574, 928954, 598378}}
{:trace_ts, #PID<0.63.0>, :out, {:user_drv, :server_loop, 6}

:erlang.trace/3 #yolo

{:trace_ts, #PID<0.65.0>, :send,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}},
 #PID<0.63.0>, 2, {1574, 928954, 598373}}
{:trace_ts, #PID<0.65.0>, :out, {:group, :server_loop, 3}, 2,
 {1574, 928954, 598374}}
{:trace_ts, #PID<0.63.0>, :in, {:user_drv, :server_loop, 6}, 2,
 {1574, 928954, 598377}}
{:trace_ts, #PID<0.63.0>, :receive,
 {#PID<0.65.0>,
  {:put_chars_sync, :unicode,
   "\e[39m{\e[0m\e[33m\e[36m:trace_ts\e[0m\e[33m\e[39m,\e[0m\e[33m #PID<0.65.0>\e[39m,\e[0m\e[33m \e[36m:send\e[0m\e[33m\e[39m,\e[0m\e[33m\n \e[39m{\e[0m\e[33m#PID<0.65.0>\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m\e[36m:requests\e[0m\e[33m\e[39m,\e[0m\e[33m \e[39m[\e[0m\e[33m\e[39m{\e[0m\e[33m\e[36m:put_chars\e[0m\e[33m\e[39m,\e[0m\e[33m \e[36m:unicode\e[0m\e[33m\e[39m,\e[0m\e[33m 'iex(32)> '\e[39m}\e[0m\e[33m\e[39m]\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\e[39m,\e[0m\e[33m\n #PID<0.63.0>\e[39m,\e[0m\e[33m 2\e[39m,\e[0m\e[33m \e[39m{\e[0m\e[33m1574\e[39m,\e[0m\e[33m 928951\e[39m,\e[0m\e[33m 843593\e[39m}\e[0m\e[33m\e[39m}\e[0m\e[33m\n",
   {{#PID<0.436.0>, #Reference<0.4113248558.1777598466.80190>}, :ok}}}, 2,
 {1574, 928954, 598378}}
{:trace_ts, #PID<0.63.0>, :out, {:user_drv, :server_loop, 6}

:erlang.trace/3 #yolo

09:22:41.730 [info] SIGTERM received - shutting down

[1]    31899 killed

:recon_trace

iex> :recon_trace.calls({String, :split, 2}, 1)
1
iex> for i <- 1..100 do String.split("asdfasdf", "a") end


9:30:38.888919 <0.630.0> 
'Elixir.String':split(<<"asdfasdf">>, <<"a">>)

Recon tracer rate limit tripped.

:recon_trace

  • safer on production
  • useful during development
  • add "recon" to your deps today!
  • check out :recon and other :recon_ modules too!

Monitoring production

Exception tracking

Two ways to do it

  • try ... catch
  • Logger way
  • Combination of the two

appsignal-elixir

Catching exceptions: Plug

use Appsignal.Plug

...

conn = Plug.Conn.put_private(conn, 
                             :appsignal_transaction,
                              transaction)

try do
  super(conn, opts)
catch
  kind, reason -> Appsignal.Plug.handle_error(
                  conn, kind, reason,
                  System.stacktrace())
else
  conn -> Appsignal.Plug.finish_with_conn(
                           transaction,
                           conn)
end

More on our blog:

Catching exceptions: logs

:logger.add_handler(
  :appsignal,
  Appsignal.LoggerHandler, %{}
 )

While we're at it...

:telemetry.attach(
  "appsignal-ecto",
  [:my_app, :repo, :query],
  &Appsignal.Ecto.handle_event/4,
  nil
)

:telemetry

:telemetry - dynamic dispatch library

:telemetry.execute(
  [:web, :request, :done],
  %{latency: latency},
  %{request_path: path, status_code: status}
)

:telemetry - dynamic dispatch library

defmodule LogResponseHandler do
  require Logger

  def handle_event([:web, :request, :done], 
                   measurements, metadata,
                   _config) do
    Logger.info(
      "[#{metadata.request_path}] 
        #{metadata.status_code}
        sent in #{measurements.latency}")
  end
end

:telemetry - dynamic dispatch library

:ok = :telemetry.attach(
  # unique handler id
  "log-response-handler",
  [:web, :request, :done],
  &LogResponseHandler.handle_event/4,
  nil
)

:telemetry notes

  • tiny library
  • attach happens at runtime
  • ideal to be used by library authors
  • is becoming de-facto instrumentation standard in Elixir (and Erlang?) ecosystem
  • telemetry_metrics, telemetry_poller

Distributed tracing

OpenTelemetry

OpenCensus + OpenTracing

OpenTelemetry

  • Defines a standard for distributed tracing
  • Implementation independent
  • Multiple back-ends (Jager, DataDog etc.)
  • traces - outer containers for actions
  • spans - inner containers for actions
  • automatic trace_id propagation
  • spandex

Thank you!

Keeping tabs on production

By Hubert Łępicki

Keeping tabs on production

  • 1,440