Concurrency in Ruby

Concurrency and Parallelism

  • Concurrency:
  • A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.
  • Parallelism:
  • A condition that arises when at least two threads are executing simultaneously.
 

"Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once." - Rob Pike

Sources:

https://vimeo.com/49718712

http://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil

http://tenderlovemaking.com/2012/06/18/removing-config-threadsafe.html

Concurrency and Parallelism

Concurrency approaches

  • Threads (Java). Threads sharedin memory
  • New process per request (PHP). No Deadlocks and threadsafe
  • Actor Model (Erlang, Scala). Messaging between actors
  • AMQP (RabbitMQ, kafka). Advanced Message Queuing Protocol
  • Hybrid solutions (e.g. Shoryuken) Amazon SQS thread-based message processor

Ruby challenges

 

GIL & GC

Global Interpreter Lock

GIL is a locking mechanism that is meant to protect your data integrity(CRuby, CPython) It force to run only one thread at a time even on a multicore processor

Garbage Collection

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. - Wikipedia


Resources other than memory, such as network sockets, database handles, user interaction windows, and file and device descriptors, are not typically handled by garbage collection.

A well known issue in MRI

  • About 12% of CPU time used in garbage collection per process on MRI compared to under 2% on JRuby
  • Lots of GC pauses on MRI, requests have to wait for the pauses
  • Huge waste of CPU

Source: https://www.youtube.com/watch?v=etCJKDCbCj4

Why GIL?

  • Some C extensation aren't Threadsafe
  • Deadlocks
  • Data integrity

Example

$ ruby pushing_nil.rb
5000

$ jruby pushing_nil.rb
4446

$ rbx pushing_nil.rb
3088
array = []

5.times.map do
  Thread.new do
    1000.times do
      array << nil
    end
  end
end.each(&:join)

puts array.size

Implementation

VALUE
rb_ary_push(VALUE ary, VALUE item)
{
    long idx = RARRAY_LEN(ary);

    ary_ensure_room_for_push(ary, 1);
    RARRAY_ASET(ary, idx, item);
    ARY_SET_LEN(ary, idx + 1);
    return ary;
}

Is it so bad?

 In web each thread spend quite a lot of time in IOs which won’t block the thread scheduler. So if you receive two quasi-concurrent requests you might not even be affected by the GIL

Yehuda Katz

Multiple Process/Forking

  • Passanger
  • Unicorn
  • Resque
  • delayed_job

It allow to fork as many process as need and control them in case of hanging or memory leaks

Actors/Fibers

  • Actors implementations like Celluloid (sidekiq)
  • Fibers

Non blocking IO

Reactor pattern

  • Goliath
  • Thin
  • Twisted
  • EvenMachine
  • Node.js(http://howtonode.org/control-flow-part-ii/file-write.js)

Reactor Pattern

If DB query(I/O) blocking other requests then wrap it in a fiber, trigger an async call and pause the fiber so another request can get processed. Once the DB query comes back, it wakes up the fiber it was trigger from, which then sends the response back to the client. 

Pros/Cons delayed_job

Pros/Cons delayed_job

PROS:

  • Easy to setup

 

CONS:

  • Use db to store jobs thus makes it not scalabale
  • Use process based approach
  • No comprehensive monitoring solution, no alerting

Cons/Pros sidekiq

Cons/Pros sidekiq

Cons/Pros sidekiq

CONS:

  • Extra dependency on redis

PROS:

  • Scalable, widely used solution
  • Actively supported
  • Thread based, saves resources
  • Out of box monitoring solution and web UI
  • Rich in features (failed jobs self retry, expire and unique jobs)
  • Scheduled jobs execution
  • Advanced queueing management
  • Clear Ruby API
  • Middleware API

Ent. sidekiq Pros

PROS:

  • Jobs encryption
  • Retain historical metrics
  • Support long running jobs by rolling restarts strategy
  • Rate limiting to third parties
  • Multi-process

Sidekiq

🔥🔥🔥 PERFORMANCE 🔥🔥🔥

Sidekiq

Distribute load among workers

heavy: sidekiq -C config/sidekiq.yml -c  3

                          -q document_generation, 8

                          -q bulk_queue, 4

                          -q slow_queue, 2

lite: sidekiq -C config/sidekiq.yml -c  3

                      -q reports, 8

                      -q fast_queue, 4

                      -q slow_queue, 2

Sidekiq

Distribute load among workers

Sidekiq

Sufficient out-of-box monitoring UI

Sidekiq

Processing in batches

Sidekiq

Processing in batches

Sidekiq

Prometheus/Grafana monitoring & alerting

Credits:

Concurrency in Ruby

By Roman Bambycha

Concurrency in Ruby

  • 348