IS THIS OK...?

OR, SPOTTING UNSAFE CONCURRENT RUBY PATTERNS

💡 A video recording of this talk is available at https://youtu.be/5VHc27SYAIw

who am i

IVO anjo

Ruby ❤

CONCURRENCY ❤

}

JRuby ❤

amazon profiler team

previously @ talkdesk

CONCURRENCY: WhAT?

DEALING WITH
multiple things

during

the same

overlapping time period

RUBY Concurrency primitives

Threads

FIBERS

PROCESSES

Focus for this talk

Threads

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

CONCURRENT

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

PARALLEL

threads in ruby VMs

JRuby

TruffleRuby

Rubinius

MRI Ruby

NOT PARALLEL

parallel?

Thread 1

Thread 2

Thread 1

Thread 2

time

talk applies to all rubies

JRuby

TruffleRuby

Rubinius

MRI Ruby

THIS TALK

CONCURRENCY USING THREADS: WHERE?

Web frameworks

● Rails
● HANAMI
● ...

Web SERVERS

● PUMA
● PASSENGER
● WEBRICK
● ...

MESSAGE/
JOB PROCESSING

● SIDEKIQ
● sucker_PUNCH
● BUNNY
● ...

your code

(or one of your dependencies)

IS already using
MULTIPLE threads

(and/or other concurrency primitives!)

CONCURRENCY USING THREADS: WHY?

Improve Efficiency ➜

Get the most out of your hardware budget

Reduce Latency ➜ Happy users

unsafe patterns:
WHAT TO LOOK OUT FOR

BASED ON TRUE EVENTS

pattern #1

Lazy initialization with ||= is not atomic

def client
  @database_client ||= new_client
end

def new_client
  sleep 1 # simulate connection setup time
  puts "Created database client"
  :client
end

pattern #1

Restart your service under load

5.times.map {
  Thread.new { client } }.each(:&join)

$ ruby database_client_race.rb
Created database client
Created database client
Created database client
Created database client
Created database client

➜ all your threads will try to connect to the database

pattern #1

Overwhelm your database with new connections ➜ Go over limit?

Resources may get leaked ➜
What if the database client keeps own threads? Other global resources?

pattern #1

Maybe it's not a database client ➜
Maybe it sends off an e-mail, triggers an alarm, charges someone's credit card, ...

attempted solution for #1

Perhaps adding a Mutex will fix it?

DATABASE_INIT_MUTEX = Mutex.new

def client
  @database_client ||=
    DATABASE_INIT_MUTEX.synchronize {
      new_client
    }
end

attempted solution for #1

$ ruby database_client_race_fix_attempt.rb
Created database client
Created database client
Created database client
Created database client
Created database client

solution for #1

Need to check that other threads haven't gotten there first!

DATABASE_INIT_MUTEX = Mutex.new

def client
  DATABASE_INIT_MUTEX.synchronize {
    @database_client ||= new_client
  }
end

solution for #1

Works!

$ ruby database_client_race_fixed_1.rb
Created database client

note #1

Important note: Make sure that all your threads use the same Mutex.

def client
  Mutex.new.synchronize { # this is wrong! 😭
    @database_client ||= new_client
  }
end

TAKEAWAYS FROM #1

Be careful when being concurrent & lazy

Always re-check state after getting called by the Mutex

All threads need to synchronize
on the same Mutex

pattern #2

Don't change and iterate hashes concurrently

@hash = {}
t1 = Thread.new {
  10_000.times { |n| @hash[n] = true }
}
t2 = Thread.new {
  @hash.each { } while @hash.size < 10_000
}
t1.join rescue nil
t2.join rescue nil

pattern #2

$ ruby concurrent_iterate_mutate.rb
Traceback (most recent call last):
        2: from concurrent_iterate_mutate.rb:3:in `block in <main>'
        1: from concurrent_iterate_mutate.rb:3:in `times'
concurrent_iterate_mutate.rb:3:in `block (2 levels) in <main>':
  can`t add a new key into hash during iteration (RuntimeError)

Jumps out in example ➜ not at all in production

Solution for #2

Avoid sharing ➜ Create a copy instead

Use a persistent data structure ➜
Similar to copy, but more efficient

Solution for #2

Use concurrent-ruby gem

require 'concurrent'

@hash = Concurrent::Map.new

# instead of
@hash = {}

TAKEAWAYS FROM #2

Avoid mutating any shared object ➜
Not just hashes

Can break only under production traffic ➜
Specs/integration tests are too simple to catch

Concurrent Ruby is your friend ➜
If you really need to mutate shared stuff

pattern #3

Thread/Fiber variables behave somewhat as global

class PerThreadGateway
  def initialize(address)
    @address = address
  end
  def client
    Thread.current[:connection] ||= new_client
  end
  def new_client
    puts "Creating a new client!"
    :"client_for_#{@address}"
  end
end

pattern #3

g1 = PerThreadGateway.new("google.com")
g2 = PerThreadGateway.new("github.com")
puts "For gateway 1, got #{g1.client}"
puts "For gateway 2, got #{g2.client}"

$ ruby thread_local_variables.rb
Creating a new client!
For gateway 1, got client_for_google.com
For gateway 2, got client_for_google.com

SOLUTION FOR #3

Use concurrent-ruby's ThreadLocalVar

require 'concurrent'

class PerThreadGateway
  def initialize(address)
    @address = address
    @thread_local = Concurrent::ThreadLocalVar.new
  end
  def client
    @thread_local.value ||= new_client
  end
  # ...

SOLUTION FOR #3

Each ThreadLocalVar instance keeps a different per-thread value

$ ruby thread_local_variables_fixed.rb
Creating a new client!
For gateway 1, got client_for_google.com
Creating a new client!
For gateway 2, got client_for_github.com

TAKEAWAYS FROM #3

Thread/Fiber-local variables
are a low-level mechanism ➜
Understand it well before using

Use concurrent-ruby's
higher-level alternative

pattern #4

Default values by reference VS using block

array1 = Array.new(5, {})
array2 = Array.new(5) { {} }

hash1 = Hash.new([])
hash2 = Hash.new { [] }

# or even....
local1 = Concurrent::ThreadLocalVar.new([])
local2 = Concurrent::ThreadLocalVar.new { [] }

SPOT THE
DIFFERENCE

pattern #4

array1 = Array.new(5, {})
array1.first[:hello] = :world
puts "Array1 (with Reference): #{array1.inspect}"
# Output:
Array1 (with Reference): [{:hello=>:world},
  {:hello=>:world}, {:hello=>:world}, {:hello=>:world},
  {:hello=>:world}]

array2 = Array.new(5) { {} }
array2.first[:hello] = :world
puts "Array2 (with Block): #{array2.inspect}"
# Output:
Array2 (with Block): [{:hello=>:world}, {}, {}, {}, {}]

pattern #4

hash1 = Hash.new([])
hash1[:hello] << :world
hash1[:foo] << :bar
puts "Hash1 (with Reference): #{hash1[:hello].inspect}"
# Output:
Hash1 (with Reference): [:world, :bar]

hash2 = Hash.new { [] }
hash2[:hello] << :world
hash2[:foo] << :bar
puts "Hash2 (with Block): #{hash2[:hello].inspect}"
# Output:
Hash2 (with Block): []

pattern #4

local1 = Concurrent::ThreadLocalVar.new([])
Thread.new { local1.value << :hello }.join
Thread.new { local1.value << :world }.join
puts "Threadlocal1 (with Reference): 
  #{Thread.new { local1.value }.value}"
# Output
Threadlocal1 (with Reference): [:hello, :world]

local2 = Concurrent::ThreadLocalVar.new { [] }
Thread.new { local2.value << :hello }.join
Thread.new { local2.value << :world }.join
puts "Threadlocal2 (with Block):
  #{Thread.new { local2.value }.value}"
# Output
Threadlocal2 (with Block): []

Takeaways from #4

Default value by reference ➜
Always the same object

Default value using block ➜
Different objects
(unless you return an existing reference)

TAKEAWAYS FROM #4

Both by reference/using block are useful ➜
Be very careful picking the right one

Thread-local variant ➜ Especially evil

pattern #5

class Foo
  def hello; puts "Hello!"; end

  sleep 1 # Simulate slow load

  def bye; puts "Goodbye!"; end
end

require, autoload and class definitions are not atomic

pattern #5

Thread.new { require './foo' }

sleep 0.1
f = Foo.new
f.hello
f.bye

$ ruby background_require.rb
Hello!
Traceback (most recent call last):
background_require.rb:7:in `<main>': undefined method
`bye' for #<Foo:0x00005622a9a3cbf0> (NoMethodError)

SOLUTION FOR #5

Require before use (in the current thread)

Alternative ➜
Require once before all threads start

TAKEAWAYS FROM #5

Be careful when being concurrent & lazy ➜ (Again)

Ruby's internal structures are thread-safe ➜ many times that doesn't extend to your code

pattern #6

Shared worker/interactor/controller with state

class Worker
  def call(foo, bar)
    @foo = foo; @bar = bar
    some_domain_logic
    @foo + @bar
  end
  def some_domain_logic; sleep(rand) end
end

worker = Worker.new

t1 = Thread.new { puts "Result for w1 is #{worker.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker.call(3, 4)}" }
t1.join; t2.join

pattern #6

$ ruby shared_worker.rb 
Result for w1 is 3
Result for w2 is 3

$ ruby shared_worker.rb 
Result for w1 is 7
Result for w2 is 7

SOLUTION FOR #6

worker1 = Worker.new
worker2 = Worker.new

t1 = Thread.new { puts "Result for w1 is #{worker1.call(1, 2)}" }
t2 = Thread.new { puts "Result for w2 is #{worker2.call(3, 4)}" }

TAKEAWAYS FROM #6

Don't share objects and mutate their state at the same time 😁 😁 😁

Objects are cheap ➜
Be careful about over-optimization

Ruth Hartnup https://www.flickr.com/photos/ruthanddave/8300870890 CC-BY 2.0

KEY TAKEAWAYS

Use concurrent-ruby!

● Thread pools
● Elixir/Erlang-like actors
● Go-like channels
● Promises
● Maybe/Optional

KEY TAKEAWAYS

● Supports MRI Ruby & JRuby/TruffleRuby/Rubinius

● Dependency of activesupport ➜ You already have it

Use concurrent-ruby!

KEY TAKEAWAYS

Do not mutate shared objects

KEY TAKEAWAYS

Use immutable data structures ➜
persistent-dmnd gem

require 'persistent-💎'

immutable_array = a💎[:hello, :talkdesk]
immutable_hash = h💎[I: 'hope', "you're": 'enjoying']
immutable_set = s💎[:the, :"weather!"]

KEY TAKEAWAYS

JRuby/TruffleRuby/Rubinius's
threads are parallel ➜
Faster concurrency!

FINAL KEY TAKEAWAY

Have fun! 🙋‍

THANK YOU!

my ruby/jruby/whatever blog: https://ivoanjo.me

Code examples @
  gitlab.com/ivoanjo/unsafe-concurrent-ruby-patterns

Concurrent-ruby gem
  www.concurrent-ruby.com
Persistent-💎 gem
  gitlab.com/ivoanjo/persistent-dmnd

@knux

IS THIS OK...?

OR, SPOTTING UNSAFE CONCURRENT RUBY PATTERNS

who am i

IVO anjo

Ruby ❤

CONCURRENCY ❤

JRuby ❤

CONCURRENCY: WhAT?

DEALING WITH multiple things

during

the same

overlapping time period

RUBY Concurrency primitives

Threads

FIBERS

PROCESSES

Threads

threads in ruby VMs

CONCURRENT

threads in ruby VMs

PARALLEL

threads in ruby VMs

NOT PARALLEL

parallel?

talk applies to all rubies

THIS TALK

CONCURRENCY USING THREADS: WHERE?

Web frameworks

● Rails ● HANAMI ● ...

Web SERVERS

● PUMA ● PASSENGER ● WEBRICK ● ...

MESSAGE/ JOB PROCESSING

● SIDEKIQ ● sucker_PUNCH ● BUNNY ● ...

your code

(or one of your dependencies)

IS already using MULTIPLE threads

CONCURRENCY USING THREADS: WHY?

unsafe patterns: WHAT TO LOOK OUT FOR

BASED ON TRUE EVENTS

pattern #1

pattern #1

pattern #1

pattern #1

attempted solution for #1

attempted solution for #1

solution for #1

solution for #1

note #1

TAKEAWAYS FROM #1

pattern #2

pattern #2

Solution for #2

Solution for #2

TAKEAWAYS FROM #2

pattern #3

pattern #3

SOLUTION FOR #3

SOLUTION FOR #3

TAKEAWAYS FROM #3

pattern #4

SPOT THE DIFFERENCE

pattern #4

pattern #4

pattern #4

Takeaways from #4

TAKEAWAYS FROM #4

pattern #5

pattern #5

SOLUTION FOR #5

TAKEAWAYS FROM #5

pattern #6

pattern #6

SOLUTION FOR #6

TAKEAWAYS FROM #6

KEY TAKEAWAYS

KEY TAKEAWAYS

KEY TAKEAWAYS

KEY TAKEAWAYS

KEY TAKEAWAYS

FINAL KEY TAKEAWAY

DEALING WITH
multiple things

● Rails
● HANAMI
● ...

● PUMA
● PASSENGER
● WEBRICK
● ...

MESSAGE/
JOB PROCESSING

● SIDEKIQ
● sucker_PUNCH
● BUNNY
● ...

IS already using
MULTIPLE threads

unsafe patterns:
WHAT TO LOOK OUT FOR

SPOT THE
DIFFERENCE