}
amazon profiler team
previously @ talkdesk
Focus for this talk
JRuby
TruffleRuby
Rubinius
MRI Ruby
JRuby
TruffleRuby
Rubinius
MRI Ruby
JRuby
TruffleRuby
Rubinius
MRI Ruby
Thread 1
Thread 2
Thread 1
Thread 2
time
time
JRuby
TruffleRuby
Rubinius
MRI Ruby
(and/or other concurrency primitives!)
Improve Efficiency ➜
Get the most out of your hardware budget
Reduce Latency ➜ Happy users
Lazy initialization with ||= is not atomic
def client
@database_client ||= new_client
end
def new_client
sleep 1 # simulate connection setup time
puts "Created database client"
:client
end
Restart your service under load
5.times.map {
Thread.new { client } }.each(:&join)
$ ruby database_client_race.rb
Created database client
Created database client
Created database client
Created database client
Created database client
➜ all your threads will try to connect to the database
Overwhelm your database with new connections ➜ Go over limit?
Resources may get leaked ➜
What if the database client keeps own threads? Other global resources?
Maybe it's not a database client ➜
Maybe it sends off an e-mail, triggers an alarm, charges someone's credit card, ...
Perhaps adding a Mutex will fix it?
DATABASE_INIT_MUTEX = Mutex.new
def client
@database_client ||=
DATABASE_INIT_MUTEX.synchronize {
new_client
}
end
$ ruby database_client_race_fix_attempt.rb
Created database client
Created database client
Created database client
Created database client
Created database client
Need to check that other threads haven't gotten there first!
DATABASE_INIT_MUTEX = Mutex.new
def client
DATABASE_INIT_MUTEX.synchronize {
@database_client ||= new_client
}
end
Works!
$ ruby database_client_race_fixed_1.rb
Created database client
Important note: Make sure that all your threads use the same Mutex.
def client
Mutex.new.synchronize { # this is wrong! 😭
@database_client ||= new_client
}
end
Be careful when being concurrent & lazy
Always re-check state after getting called by the Mutex
All threads need to synchronize
on the same Mutex
Don't change and iterate hashes concurrently
@hash = {}
t1 = Thread.new {
10_000.times { |n| @hash[n] = true }
}
t2 = Thread.new {
@hash.each { } while @hash.size < 10_000
}
t1.join rescue nil
t2.join rescue nil
$ ruby concurrent_iterate_mutate.rb
Traceback (most recent call last):
2: from concurrent_iterate_mutate.rb:3:in `block in <main>'
1: from concurrent_iterate_mutate.rb:3:in `times'
concurrent_iterate_mutate.rb:3:in `block (2 levels) in <main>':
can`t add a new key into hash during iteration (RuntimeError)
Jumps out in example ➜ not at all in production
Avoid sharing ➜ Create a copy instead
Use a persistent data structure ➜
Similar to copy, but more efficient
Use concurrent-ruby gem
require 'concurrent'
@hash = Concurrent::Map.new
# instead of
@hash = {}
Avoid mutating any shared object ➜
Not just hashes
Can break only under production traffic ➜
Specs/integration tests are too simple to catch
Concurrent Ruby is your friend ➜
If you really need to mutate shared stuff
Thread/Fiber variables behave somewhat as global
class PerThreadGateway
def initialize(address)
@address = address
end
def client
Thread.current[:connection] ||= new_client
end
def new_client
puts "Creating a new client!"
:"client_for_#{@address}"
end
end
g1 = PerThreadGateway.new("google.com")
g2 = PerThreadGateway.new("github.com")
puts "For gateway 1, got #{g1.client}"
puts "For gateway 2, got #{g2.client}"
$ ruby thread_local_variables.rb
Creating a new client!
For gateway 1, got client_for_google.com
For gateway 2, got client_for_google.com
Use concurrent-ruby's ThreadLocalVar
require 'concurrent'
class PerThreadGateway
def initialize(address)
@address = address
@thread_local = Concurrent::ThreadLocalVar.new
end
def client
@thread_local.value ||= new_client
end
# ...
Each ThreadLocalVar instance keeps a different per-thread value
$ ruby thread_local_variables_fixed.rb
Creating a new client!
For gateway 1, got client_for_google.com
Creating a new client!
For gateway 2, got client_for_github.com
Thread/Fiber-local variables
are a low-level mechanism ➜
Understand it well before using
Use concurrent-ruby's
higher-level alternative
Default values by reference VS using block
array1 = Array.new(5, {})
array2 = Array.new(5) { {} }
hash1 = Hash.new([])
hash2 = Hash.new { [] }
# or even....
local1 = Concurrent::ThreadLocalVar.new([])
local2 = Concurrent::ThreadLocalVar.new { [] }
array1 = Array.new(5, {})
array1.first[:hello] = :world
puts "Array1 (with Reference): #{array1.inspect}"
# Output:
Array1 (with Reference): [{:hello=>:world},
{:hello=>:world}, {:hello=>:world}, {:hello=>:world},
{:hello=>:world}]
array2 = Array.new(5) { {} }
array2.first[:hello] = :world
puts "Array2 (with Block): #{array2.inspect}"
# Output:
Array2 (with Block): [{:hello=>:world}, {}, {}, {}, {}]
hash1 = Hash.new([])
hash1[:hello] << :world
hash1[:foo] << :bar
puts "Hash1 (with Reference): #{hash1[:hello].inspect}"
# Output:
Hash1 (with Reference): [:world, :bar]
hash2 = Hash.new { [] }
hash2[:hello] << :world
hash2[:foo] << :bar
puts "Hash2 (with Block): #{hash2[:hello].inspect}"
# Output:
Hash2 (with Block): []
local1 = Concurrent::ThreadLocalVar.new([])
Thread.new { local1.value << :hello }.join
Thread.new { local1.value << :world }.join
puts "Threadlocal1 (with Reference):
#{Thread.new { local1.value }.value}"
# Output
Threadlocal1 (with Reference): [:hello, :world]
local2 = Concurrent::ThreadLocalVar.new { [] }
Thread.new { local2.value << :hello }.join
Thread.new { local2.value << :world }.join
puts "Threadlocal2 (with Block):
#{Thread.new { local2.value }.value}"
# Output
Threadlocal2 (with Block): []
Default value by reference ➜
Always the same object
Default value using block ➜
Different objects
(unless you return an existing reference)
Both by reference/using block are useful ➜
Be very careful picking the right one
Thread-local variant ➜ Especially evil
class Foo
def hello; puts "Hello!"; end
sleep 1 # Simulate slow load
def bye; puts "Goodbye!"; end
end
require, autoload and class definitions are not atomic
Thread.new { require './foo' }
sleep 0.1
f = Foo.new
f.hello
f.bye
$ ruby background_require.rb
Hello!
Traceback (most recent call last):
background_require.rb:7:in `<main>': undefined method
`bye' for #<Foo:0x00005622a9a3cbf0> (NoMethodError)
Require before use (in the current thread)
Alternative ➜
Require once before all threads start
Be careful when being concurrent & lazy ➜ (Again)
Ruby's internal structures are thread-safe ➜ many times that doesn't extend to your code
Shared worker/interactor/controller with state
class Worker
def call(foo, bar)
@foo = foo; @bar = bar
some_domain_logic
@foo + @bar
end
def some_domain_logic; sleep(rand) end
end
worker = Worker.new
t1 = Thread.new { puts "Result for w1 is #{worker.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker.call(3, 4)}" }
t1.join; t2.join
$ ruby shared_worker.rb
Result for w1 is 3
Result for w2 is 3
$ ruby shared_worker.rb
Result for w1 is 7
Result for w2 is 7
worker1 = Worker.new
worker2 = Worker.new
t1 = Thread.new { puts "Result for w1 is #{worker1.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker2.call(3, 4)}" }
Don't share objects and mutate their state at the same time 😁 😁 😁
Objects are cheap ➜
Be careful about over-optimization
Ruth Hartnup https://www.flickr.com/photos/ruthanddave/8300870890 CC-BY 2.0
Use concurrent-ruby!
● Thread pools
● Elixir/Erlang-like actors
● Go-like channels
● Promises
● Maybe/Optional
● Supports MRI Ruby & JRuby/TruffleRuby/Rubinius
● Dependency of activesupport ➜ You already have it
Use concurrent-ruby!
Do not mutate shared objects
Use immutable data structures ➜
persistent-dmnd gem
require 'persistent-💎'
immutable_array = a💎[:hello, :talkdesk]
immutable_hash = h💎[I: 'hope', "you're": 'enjoying']
immutable_set = s💎[:the, :"weather!"]
JRuby/TruffleRuby/Rubinius's
threads are parallel ➜
Faster concurrency!
Have fun! 🙋
my ruby/jruby/whatever blog: https://ivoanjo.me
Code examples @ gitlab.com/ivoanjo/unsafe-concurrent-ruby-patterns Concurrent-ruby gem www.concurrent-ruby.com Persistent-💎 gem gitlab.com/ivoanjo/persistent-dmnd
@knux