Fault tolerance in Ruby
Hubert Łepicki
wroc_love.rb 2017
@hubertlepicki
We'll talk fault tolerance...
...concurrency and scalability...
...and do some time travel!
Fault tolerance?
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components.
random people from Wikipedia
(...) fault tolerance is the ability for software to detect and recover from a fault that is happening (...)
https://users.ece.cmu.edu/~koopman/des_s99/sw_fault_tolerance/
Fault tolerance is hard
Let's go back to year 1992
"Of course 5 years from now that will be different, but 5 years from now everyone will be running free GNU on their 200 MIPS, 64M SPARCstation-5."
Implementing fault-tolerant systems costs (money, time, effort, complexity)
Basic defence techniques
Exceptions
begin
...
rescue SomeExceptionClass => error
logger.info "Exception caught"
...
end
begin
...
rescue SomeExceptionClass, SomeOther => error
logger.info "Exception caught"
...
end
begin
...
rescue SomeException => e
...
rescue SomeOther => e
...
end
tries = 0
begin
tries += 1
...
rescue
retry if tries < 4
...
end
f = open("file.txt") rescue nil
begin
AWSWrapper.some_external_operation
rescue StandardError => error
logger.error "Exception intercepted calling AWSWrapper"
logger.error error.message
logger.error error.backtrace.join("\n")
end
require 'timeout'
begin
Timeout::timeout(3) do
AWSWrapper.some_external_operation
end
rescue StandardError => error
logger.error "Exception intercepted calling AWSWrapper"
logger.error error.message
logger.error error.backtrace.join("\n")
end
Prevent cascading system failures
unicorn master -c /app/unicorn.rb -E production -D
unicorn worker[0] -c /app/unicorn.rb -E production -D
unicorn worker[1] -c /app/unicorn.rb -E production -D
unicorn worker[2] -c /app/unicorn.rb -E production -D
unicorn worker[3] -c /app/unicorn.rb -E production -D
Simple bulkhead may not be enough
Fail fast
to prevent crashing whole system
Circuit Breaker Pattern
- Allows requests when "closed" state
- Detects failures and switches to "open" state
- Fails all requests while in "open" state
- Switches back to "closed" or "half-closed" state after some interval
require 'circuit_breaker'
class ApiWrapper
include CircuitBreaker
def call_remote_service
...
end
circuit_method :call_remote_service
end
Semian
- circuit breaker
- ready to use adapters (mysql, redis, net/http)
- fail fast philosophy
- works with Ruby / OS processes (but NOT threads)
- uses Unix IPC to synchronize
- modifies behaviors of third party libs
SEMIAN_PARAMETERS = { tickets: 1,
success_threshold: 1,
error_threshold: 3,
error_timeout: 10 }
Semian::NetHTTP.semian_configuration = proc do |host, port|
# Let's make it only active for www.wrocloverb.com
if host == "www.wrocloverb.com"
SEMIAN_PARAMETERS.merge(name: "wroc_love.rb")
else
nil
end
end
Microservices/SOA/breaking up into N apps
Persistent connections
System always breaks when no one was looking
opts = {
...
heartbeat: 0,
...
}
Reason: firewalls and other "smart" network gear
Solution: enable heartbeat
It takes 11-25 minutes to detect "dead" TCP connection for Linux
Whie we're on RabbitMQ/Microservices/you name it...
Let's go back to year 1986
Actor model concurrency
Let it fail / fail fast
Avoid defensive programming
Built-in mechanisms to detect crashes
(monitors, links)
Built-in mechanism to recover from errors
(supervisors)
Kill 2 birds with one stone
Actor model for Ruby?
class Counter
# This is all you have to do to turn any Ruby
# class into one which creates
# Celluloid actors instead of normal objects
include Celluloid
# Now just define methods like you ordinarily would
attr_reader :count
def initialize
@count = 0
end
def increment(n = 1)
@count += n
end
end
actor = Counter.new
p actor.count
# Log *all* exceptions thrown by *all* actors in the system
Celluloid.exception_handler { |ex| MyNotifier.notify(ex) }
# Reference your actors by name
Celluloid::Actor[:itchy] = Itchy.new
Actor[:itchy].scratch()
# Supervise your actors
class MyGroup < Celluloid::SupervisionGroup
supervise Itchy, as: :itchy
end
Celluloid
- turns objects into actors
- allows linking actors
- allows supervision/restarts
Let's move forward to year 2020
Ruby 3.0.0
Koichi Sasada
proposed
Guilds
Elixir/Erlang-inspired
Elements of immutability and actor model concurrency
Guilds will simplify concurrency
Guilds will improve fault tolerance
Resources
Circuit Breaker
by Martin Fowler
A proposal of new concurrency model for Ruby 3
Koichi Sasada
http://www.atdot.net/~ko1/activities/2016_rubykaigi.pdf
https://www.youtube.com/watch?v=WIrYh14H9kA&feature=youtu.be
That's it for today!
Thanks!
Fault tolerance in Ruby - Hubert Łępicki
By Hubert Łępicki
Fault tolerance in Ruby - Hubert Łępicki
- 3,197