Thread 1
Profiler Thread
time
Thread 1
time
Thread 1
Profiler Thread
time
IS THIS OK...?
OR, SPOTTING UNSAFE CONCURRENT RUBY PATTERNS
who am i
IVO anjo
Ruby ❤
CONCURRENCY ❤
}
JRuby ❤
amazon profiler team
previously @ talkdesk
CONCURRENCY: WhAT?
DEALING WITH
multiple things
during
the same
overlapping time period
RUBY Concurrency primitives
Threads
FIBERS
PROCESSES
Focus for this talk
Threads
threads in ruby VMs
JRuby
TruffleRuby
Rubinius
MRI Ruby
CONCURRENT
threads in ruby VMs
JRuby
TruffleRuby
Rubinius
MRI Ruby
PARALLEL
threads in ruby VMs
JRuby
TruffleRuby
Rubinius
MRI Ruby
NOT PARALLEL
talk applies to all rubies
JRuby
TruffleRuby
Rubinius
MRI Ruby
THIS TALK
CONCURRENCY USING THREADS: WHERE?
Web frameworks
● Rails
● HANAMI
● ...
Web SERVERS
● PUMA
● PASSENGER
● WEBRICK
● ...
MESSAGE/
JOB PROCESSING
● SIDEKIQ
● sucker_PUNCH
● BUNNY
● ...
your code
(or one of your dependencies)
IS already using
MULTIPLE threads
(and/or other concurrency primitives!)
CONCURRENCY USING THREADS: WHY?
Improve Efficiency ➜
Get the most out of your hardware budget
Reduce Latency ➜ Happy users
unsafe patterns:
WHAT TO LOOK OUT FOR
BASED ON TRUE EVENTS
pattern #1
Lazy initialization with ||= is not atomic
def client
@database_client ||= new_client
end
def new_client
sleep 1 # simulate connection setup time
puts "Created database client"
:client
end
pattern #1
Restart your service under load
5.times.map {
Thread.new { client } }.each(:&join)
$ ruby database_client_race.rb
Created database client
Created database client
Created database client
Created database client
Created database client
➜ all your threads will try to connect to the database
pattern #1
Overwhelm your database with new connections ➜ Go over limit?
Resources may get leaked ➜
What if the database client keeps own threads? Other global resources?
pattern #1
Maybe it's not a database client ➜
Maybe it sends off an e-mail, triggers an alarm, charges someone's credit card, ...
attempted solution for #1
Perhaps adding a Mutex will fix it?
DATABASE_INIT_MUTEX = Mutex.new
def client
@database_client ||=
DATABASE_INIT_MUTEX.synchronize {
new_client
}
end
attempted solution for #1
$ ruby database_client_race_fix_attempt.rb
Created database client
Created database client
Created database client
Created database client
Created database client
solution for #1
Need to check that other threads haven't gotten there first!
DATABASE_INIT_MUTEX = Mutex.new
def client
DATABASE_INIT_MUTEX.synchronize {
@database_client ||= new_client
}
end
solution for #1
Works!
$ ruby database_client_race_fixed_1.rb
Created database client
note #1
Important note: Make sure that all your threads use the same Mutex.
def client
Mutex.new.synchronize { # this is wrong! 😭
@database_client ||= new_client
}
end
TAKEAWAYS FROM #1
Be careful when being concurrent & lazy
Always re-check state after getting called by the Mutex
All threads need to synchronize
on the same Mutex
pattern #2
Don't change and iterate hashes concurrently
@hash = {}
t1 = Thread.new {
10_000.times { |n| @hash[n] = true }
}
t2 = Thread.new {
@hash.each { } while @hash.size < 10_000
}
t1.join rescue nil
t2.join rescue nil
pattern #2
$ ruby concurrent_iterate_mutate.rb
Traceback (most recent call last):
2: from concurrent_iterate_mutate.rb:3:in `block in <main>'
1: from concurrent_iterate_mutate.rb:3:in `times'
concurrent_iterate_mutate.rb:3:in `block (2 levels) in <main>':
can`t add a new key into hash during iteration (RuntimeError)
Jumps out in example ➜ not at all in production
Solution for #2
Avoid sharing ➜ Create a copy instead
Use a persistent data structure ➜
Similar to copy, but more efficient
Solution for #2
Use concurrent-ruby gem
require 'concurrent'
@hash = Concurrent::Map.new
# instead of
@hash = {}
TAKEAWAYS FROM #2
Avoid mutating any shared object ➜
Not just hashes
Can break only under production traffic ➜
Specs/integration tests are too simple to catch
Concurrent Ruby is your friend ➜
If you really need to mutate shared stuff
pattern #3
Thread/Fiber variables behave somewhat as global
class PerThreadGateway
def initialize(address)
@address = address
end
def client
Thread.current[:connection] ||= new_client
end
def new_client
puts "Creating a new client!"
:"client_for_#{@address}"
end
end
pattern #3
g1 = PerThreadGateway.new("google.com")
g2 = PerThreadGateway.new("github.com")
puts "For gateway 1, got #{g1.client}"
puts "For gateway 2, got #{g2.client}"
$ ruby thread_local_variables.rb
Creating a new client!
For gateway 1, got client_for_google.com
For gateway 2, got client_for_google.com
SOLUTION FOR #3
Use concurrent-ruby's ThreadLocalVar
require 'concurrent'
class PerThreadGateway
def initialize(address)
@address = address
@thread_local = Concurrent::ThreadLocalVar.new
end
def client
@thread_local.value ||= new_client
end
# ...
SOLUTION FOR #3
Each ThreadLocalVar instance keeps a different per-thread value
$ ruby thread_local_variables_fixed.rb
Creating a new client!
For gateway 1, got client_for_google.com
Creating a new client!
For gateway 2, got client_for_github.com
TAKEAWAYS FROM #3
Thread/Fiber-local variables
are a low-level mechanism ➜
Understand it well before using
Use concurrent-ruby's
higher-level alternative
pattern #4
Default values by reference VS using block
array1 = Array.new(5, {})
array2 = Array.new(5) { {} }
hash1 = Hash.new([])
hash2 = Hash.new { [] }
# or even....
local1 = Concurrent::ThreadLocalVar.new([])
local2 = Concurrent::ThreadLocalVar.new { [] }
SPOT THE
DIFFERENCE
pattern #4
array1 = Array.new(5, {})
array1.first[:hello] = :world
puts "Array1 (with Reference): #{array1.inspect}"
# Output:
Array1 (with Reference): [{:hello=>:world},
{:hello=>:world}, {:hello=>:world}, {:hello=>:world},
{:hello=>:world}]
array2 = Array.new(5) { {} }
array2.first[:hello] = :world
puts "Array2 (with Block): #{array2.inspect}"
# Output:
Array2 (with Block): [{:hello=>:world}, {}, {}, {}, {}]
pattern #4
hash1 = Hash.new([])
hash1[:hello] << :world
hash1[:foo] << :bar
puts "Hash1 (with Reference): #{hash1[:hello].inspect}"
# Output:
Hash1 (with Reference): [:world, :bar]
hash2 = Hash.new { [] }
hash2[:hello] << :world
hash2[:foo] << :bar
puts "Hash2 (with Block): #{hash2[:hello].inspect}"
# Output:
Hash2 (with Block): []
pattern #4
local1 = Concurrent::ThreadLocalVar.new([])
Thread.new { local1.value << :hello }.join
Thread.new { local1.value << :world }.join
puts "Threadlocal1 (with Reference):
#{Thread.new { local1.value }.value}"
# Output
Threadlocal1 (with Reference): [:hello, :world]
local2 = Concurrent::ThreadLocalVar.new { [] }
Thread.new { local2.value << :hello }.join
Thread.new { local2.value << :world }.join
puts "Threadlocal2 (with Block):
#{Thread.new { local2.value }.value}"
# Output
Threadlocal2 (with Block): []
Takeaways from #4
Default value by reference ➜
Always the same object
Default value using block ➜
Different objects
(unless you return an existing reference)
TAKEAWAYS FROM #4
Both by reference/using block are useful ➜
Be very careful picking the right one
Thread-local variant ➜ Especially evil
pattern #5
class Foo
def hello; puts "Hello!"; end
sleep 1 # Simulate slow load
def bye; puts "Goodbye!"; end
end
require, autoload and class definitions are not atomic
pattern #5
Thread.new { require './foo' }
sleep 0.1
f = Foo.new
f.hello
f.bye
$ ruby background_require.rb
Hello!
Traceback (most recent call last):
background_require.rb:7:in `<main>': undefined method
`bye' for #<Foo:0x00005622a9a3cbf0> (NoMethodError)
SOLUTION FOR #5
Require before use (in the current thread)
Alternative ➜
Require once before all threads start
TAKEAWAYS FROM #5
Be careful when being concurrent & lazy ➜ (Again)
Ruby's internal structures are thread-safe ➜ many times that doesn't extend to your code
pattern #6
Shared worker/interactor/controller with state
class Worker
def call(foo, bar)
@foo = foo; @bar = bar
some_domain_logic
@foo + @bar
end
def some_domain_logic; sleep(rand) end
end
worker = Worker.new
t1 = Thread.new { puts "Result for w1 is #{worker.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker.call(3, 4)}" }
t1.join; t2.join
pattern #6
$ ruby shared_worker.rb
Result for w1 is 3
Result for w2 is 3
$ ruby shared_worker.rb
Result for w1 is 7
Result for w2 is 7
SOLUTION FOR #6
worker1 = Worker.new
worker2 = Worker.new
t1 = Thread.new { puts "Result for w1 is #{worker1.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker2.call(3, 4)}" }
TAKEAWAYS FROM #6
Don't share objects and mutate their state at the same time 😁 😁 😁
Objects are cheap ➜
Be careful about over-optimization
Ruth Hartnup https://www.flickr.com/photos/ruthanddave/8300870890 CC-BY 2.0
KEY TAKEAWAYS
Use concurrent-ruby!
● Thread pools
● Elixir/Erlang-like actors
● Go-like channels
● Promises
● Maybe/Optional
KEY TAKEAWAYS
● Supports MRI Ruby & JRuby/TruffleRuby/Rubinius
● Dependency of activesupport ➜ You already have it
Use concurrent-ruby!
KEY TAKEAWAYS
Do not mutate shared objects
KEY TAKEAWAYS
Use immutable data structures ➜
persistent-dmnd gem
require 'persistent-💎'
immutable_array = a💎[:hello, :talkdesk]
immutable_hash = h💎[I: 'hope', "you're": 'enjoying']
immutable_set = s💎[:the, :"weather!"]
KEY TAKEAWAYS
JRuby/TruffleRuby/Rubinius's
threads are parallel ➜
Faster concurrency!
FINAL KEY TAKEAWAY
Have fun! 🙋
THANK YOU!
my ruby/jruby/whatever blog: https://ivoanjo.me
Code examples @ gitlab.com/ivoanjo/unsafe-concurrent-ruby-patterns Concurrent-ruby gem www.concurrent-ruby.com Persistent-💎 gem gitlab.com/ivoanjo/persistent-dmnd
@knux
WIP
By Ivo Anjo
WIP
Slides for my "Is this ok...? Or, spotting unsafe concurrent Ruby patterns" talk at the Fullstack LX Meetup
- 436