Thread 1

Profiler Thread

time

Thread 1

time

Thread 1

Profiler Thread

time

IS THIS OK...?

OR, SPOTTING UNSAFE CONCURRENT RUBY PATTERNS

who am i

IVO anjo

Ruby

CONCURRENCY

}

JRuby

amazon profiler team

previously @ talkdesk

CONCURRENCY: WhAT?

DEALING WITH
multiple things

 

 

 

 

during

the same

overlapping time period

RUBY Concurrency primitives

Threads

FIBERS

PROCESSES

Focus for this talk

Threads

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

CONCURRENT

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

PARALLEL

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

NOT PARALLEL

talk applies to all rubies

JRuby

TruffleRuby

Rubinius

MRI Ruby

THIS TALK

CONCURRENCY USING THREADS: WHERE?

Web frameworks

● Rails
● HANAMI
● ...

Web SERVERS

● PUMA
● PASSENGER
● WEBRICK
● ...

MESSAGE/
JOB PROCESSING

● SIDEKIQ
● sucker_PUNCH
● BUNNY
● ...

your code

(or one of your dependencies)

IS already using
MULTIPLE threads

(and/or other concurrency primitives!)

CONCURRENCY USING THREADS: WHY?

Improve Efficiency

Get the most out of your hardware budget

Reduce Latency Happy users

unsafe patterns:
WHAT TO LOOK OUT FOR

BASED ON TRUE EVENTS

pattern #1

Lazy initialization with ||= is not atomic

def client
  @database_client ||= new_client
end

def new_client
  sleep 1 # simulate connection setup time
  puts "Created database client"
  :client
end

pattern #1

Restart your service under load 

5.times.map {
  Thread.new { client } }.each(:&join)

$ ruby database_client_race.rb
Created database client
Created database client
Created database client
Created database client
Created database client

➜ all your threads will try to connect to the database

pattern #1

Overwhelm your database with new connections Go over limit?

Resources may get leaked
What if the database client keeps own threads? Other global resources?

pattern #1

Maybe it's not a database client 
Maybe it sends off an e-mail, triggers an alarm, charges someone's credit card, ...

attempted solution for #1

Perhaps adding a Mutex will fix it?

DATABASE_INIT_MUTEX = Mutex.new

def client
  @database_client ||=
    DATABASE_INIT_MUTEX.synchronize {
      new_client
    }
end

attempted solution for #1

$ ruby database_client_race_fix_attempt.rb
Created database client
Created database client
Created database client
Created database client
Created database client

solution for #1

Need to check that other threads haven't gotten there first!

DATABASE_INIT_MUTEX = Mutex.new

def client
  DATABASE_INIT_MUTEX.synchronize {
    @database_client ||= new_client
  }
end

solution for #1

Works!

$ ruby database_client_race_fixed_1.rb
Created database client

note #1

Important note: Make sure that all your threads use the same Mutex.

def client
  Mutex.new.synchronize { # this is wrong! 😭
    @database_client ||= new_client
  }
end

TAKEAWAYS FROM #1

Be careful when being concurrent & lazy

Always re-check state after getting called by the Mutex

All threads need to synchronize
on the same Mutex

pattern #2

Don't change and iterate hashes concurrently

@hash = {}
t1 = Thread.new {
  10_000.times { |n| @hash[n] = true }
}
t2 = Thread.new {
  @hash.each { } while @hash.size < 10_000
}
t1.join rescue nil
t2.join rescue nil

pattern #2

$ ruby concurrent_iterate_mutate.rb
Traceback (most recent call last):
        2: from concurrent_iterate_mutate.rb:3:in `block in <main>'
        1: from concurrent_iterate_mutate.rb:3:in `times'
concurrent_iterate_mutate.rb:3:in `block (2 levels) in <main>':
  can`t add a new key into hash during iteration (RuntimeError)

Jumps out in example not at all in production

Solution for #2

Avoid sharing  Create a copy instead

Use a persistent data structure 
Similar to copy, but more efficient

Solution for #2

Use concurrent-ruby gem

require 'concurrent'

@hash = Concurrent::Map.new

# instead of
@hash = {}

TAKEAWAYS FROM #2

Avoid mutating any shared object
Not just hashes

Can break only under production traffic 
Specs/integration tests are too simple to catch

Concurrent Ruby is your friend 
If you really need to mutate shared stuff

pattern #3

Thread/Fiber variables behave somewhat as global

class PerThreadGateway
  def initialize(address)
    @address = address
  end
  def client
    Thread.current[:connection] ||= new_client
  end
  def new_client
    puts "Creating a new client!"
    :"client_for_#{@address}"
  end
end

pattern #3

g1 = PerThreadGateway.new("google.com")
g2 = PerThreadGateway.new("github.com")
puts "For gateway 1, got #{g1.client}"
puts "For gateway 2, got #{g2.client}"

$ ruby thread_local_variables.rb
Creating a new client!
For gateway 1, got client_for_google.com
For gateway 2, got client_for_google.com

SOLUTION FOR #3

Use concurrent-ruby's ThreadLocalVar

require 'concurrent'

class PerThreadGateway
  def initialize(address)
    @address = address
    @thread_local = Concurrent::ThreadLocalVar.new
  end
  def client
    @thread_local.value ||= new_client
  end
  # ...

SOLUTION FOR #3

Each ThreadLocalVar instance keeps a different per-thread value

$ ruby thread_local_variables_fixed.rb
Creating a new client!
For gateway 1, got client_for_google.com
Creating a new client!
For gateway 2, got client_for_github.com

TAKEAWAYS FROM #3

Thread/Fiber-local variables
are a low-level mechanism 
Understand it well before using

Use concurrent-ruby's
higher-level alternative

pattern #4

Default values by reference VS using block

array1 = Array.new(5, {})
array2 = Array.new(5) { {} }

hash1 = Hash.new([])
hash2 = Hash.new { [] }

# or even....
local1 = Concurrent::ThreadLocalVar.new([])
local2 = Concurrent::ThreadLocalVar.new { [] }

SPOT THE
DIFFERENCE

pattern #4

array1 = Array.new(5, {})
array1.first[:hello] = :world
puts "Array1 (with Reference): #{array1.inspect}"
# Output:
Array1 (with Reference): [{:hello=>:world},
  {:hello=>:world}, {:hello=>:world}, {:hello=>:world},
  {:hello=>:world}]
array2 = Array.new(5) { {} }
array2.first[:hello] = :world
puts "Array2 (with Block): #{array2.inspect}"
# Output:
Array2 (with Block): [{:hello=>:world}, {}, {}, {}, {}]

pattern #4

hash1 = Hash.new([])
hash1[:hello] << :world
hash1[:foo] << :bar
puts "Hash1 (with Reference): #{hash1[:hello].inspect}"
# Output:
Hash1 (with Reference): [:world, :bar]
hash2 = Hash.new { [] }
hash2[:hello] << :world
hash2[:foo] << :bar
puts "Hash2 (with Block): #{hash2[:hello].inspect}"
# Output:
Hash2 (with Block): []

pattern #4

local1 = Concurrent::ThreadLocalVar.new([])
Thread.new { local1.value << :hello }.join
Thread.new { local1.value << :world }.join
puts "Threadlocal1 (with Reference): 
  #{Thread.new { local1.value }.value}"
# Output
Threadlocal1 (with Reference): [:hello, :world]
local2 = Concurrent::ThreadLocalVar.new { [] }
Thread.new { local2.value << :hello }.join
Thread.new { local2.value << :world }.join
puts "Threadlocal2 (with Block):
  #{Thread.new { local2.value }.value}"
# Output
Threadlocal2 (with Block): []

Takeaways from #4

Default value by reference
Always the same object

Default value using block
Different objects
(unless you return an existing reference)

TAKEAWAYS FROM #4

Both by reference/using block are useful
Be very careful picking the right one

Thread-local variant Especially evil

pattern #5

class Foo
  def hello; puts "Hello!"; end

  sleep 1 # Simulate slow load

  def bye; puts "Goodbye!"; end
end

require, autoload and class definitions are not atomic

pattern #5

Thread.new { require './foo' }

sleep 0.1
f = Foo.new
f.hello
f.bye
$ ruby background_require.rb
Hello!
Traceback (most recent call last):
background_require.rb:7:in `<main>': undefined method
`bye' for #<Foo:0x00005622a9a3cbf0> (NoMethodError)

SOLUTION FOR #5

Require before use (in the current thread)

Alternative
Require once before all threads start

TAKEAWAYS FROM #5

Be careful when being concurrent & lazy (Again)

Ruby's internal structures are thread-safe many times that doesn't extend to your code

pattern #6

Shared worker/interactor/controller with state

class Worker
  def call(foo, bar)
    @foo = foo; @bar = bar
    some_domain_logic
    @foo + @bar
  end
  def some_domain_logic; sleep(rand) end
end

worker = Worker.new

t1 = Thread.new { puts "Result for w1 is #{worker.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker.call(3, 4)}" }
t1.join; t2.join

pattern #6

$ ruby shared_worker.rb 
Result for w1 is 3
Result for w2 is 3

$ ruby shared_worker.rb 
Result for w1 is 7
Result for w2 is 7

SOLUTION FOR #6

worker1 = Worker.new
worker2 = Worker.new

t1 = Thread.new { puts "Result for w1 is #{worker1.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker2.call(3, 4)}" }

TAKEAWAYS FROM #6

Don't share objects and mutate their state at the same time 😁 😁 😁

Objects are cheap
Be careful about over-optimization

Ruth Hartnup  https://www.flickr.com/photos/ruthanddave/8300870890 CC-BY 2.0

KEY TAKEAWAYS

Use concurrent-ruby!

● Thread pools
● Elixir/Erlang-like actors
● Go-like channels
● Promises
● Maybe/Optional

KEY TAKEAWAYS

● Supports MRI Ruby & JRuby/TruffleRuby/Rubinius

 

●  Dependency of activesupport You already have it

Use concurrent-ruby!

KEY TAKEAWAYS

Do not mutate shared objects

KEY TAKEAWAYS

Use immutable data structures
persistent-dmnd gem

require 'persistent-💎'

immutable_array = a💎[:hello, :talkdesk]
immutable_hash = h💎[I: 'hope', "you're": 'enjoying']
immutable_set = s💎[:the, :"weather!"]

KEY TAKEAWAYS

JRuby/TruffleRuby/Rubinius's
threads are parallel
Faster concurrency!

FINAL KEY TAKEAWAY

Have fun! 🙋‍

THANK YOU!

my ruby/jruby/whatever blog: https://ivoanjo.me
@knux

WIP

By Ivo Anjo

WIP

Slides for my "Is this ok...? Or, spotting unsafe concurrent Ruby patterns" talk at the Fullstack LX Meetup

  • 436