ruby's hardest problem

CONCURRENCY AND PARALLELISM

WHO AM I?

CODE MONKEY

why are we here?

NO MORE FREE LUNCH

DARK MAGIC IS COOL

AND BECAUSE

Concurrency

concurrent program is one with multiple threads of control.  Each thread of control has effects on the world, and those threads are interleaved in some arbitrary way by the scheduler.  We say that a concurrent programming language is non-deterministic, because the total effect of the program may depend on the particular interleaving at runtime.  The programmer has the tricky task of controlling this non-determinism using synchronisation, to make sure that the program ends up doing what it was supposed to do regardless of the scheduling order.  And that’s no mean feat, because there’s no reasonable way to test that you have covered all the cases.

 

Simon Marlow

parallelism

parallel program, on the other hand, is one that merely runs on multiple processors, with the goal of hopefully running faster than it would on a single CPU.

 

Simon Marlow

WHY SO MUCH CONFUSION?

1. If you come with a view from the hardware and upwards toward to computing logic, then surely you may fall into the trap of equating the two terms. The reason is that to implement parallel computation in current hardware and software, you use a concurrent semantics. The pthread library for instance is writing a concurrent program which then takes advantage of multiple processors if they are available. 

2. The gist of it, however, is rather simple: computing parallel is to hunt for a program speedup when adding more execution cores to the system running the program. This is a property of the machine. Computing concurrent is to write a program with multiple threads of control such that it will non-deterministically execute each thread. It is a property of the language.

So there are essentially two views: from hardware or from logic. From the perspective of logic, if you have toyed with formal semantics, is that we define concurrency-primitives in a different way than parallelism-primitives formally. That is, they are different entities. The perspective of hardware on the other hand, conflates the ideas because the only way to achieve parallelism is to use
concurrency - especially in UNIX.

THANKS JESPER LOUIS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

So what we have in ruby (MRI)?

THEN WHY SO MUCH WHINING

GIL

JRUBY AND RUBINIUS TO THE RESCUE

  1. ruby-install
  2. chruby
  3. jruby profiling
  4. jruby manage
  5. rbx profile
  6. rbx memory dump
threads = []

4.times do |n|
  threads << Thread.new do
  
    result = 0
    100000000.times do
      result += 1
    end

  
  end
end

threads.map(&:join)

Execution of ruby 1_counter/working_counter.rb with MRI, Rbx and Jruby with Activity Monitor/glances

Title Text

begin
  last = 0

  loop do

    Thread.new do
      sleep(1000)
    end

    last += 1
  end

rescue
  puts 'We managed to launch ' + "#{last}".green + " threads"
end

Execution of ruby maximum_threads/maximum_threads.rb with Rbx and JRuby

NO REAL CONCURRENCY MODEL

we only got mutexes and locks...

require 'thread'
semaphore = Mutex.new

a = Thread.new {
  semaphore.synchronize {
    # access shared resource
  }
}

b = Thread.new {
  semaphore.synchronize {
    # access shared resource
  }
}

MUTEX

WHAT IS THE PROBLEM WITH LOCKS?

  • Locks do not compose
  • Locks breaks encapsulation (you need to know a lot!)
  • Taking too few locks
  • Taking too many locks
  • Taking the wrong locks
  • Taking locks in wrong order
  • Error recovery is hard
  • It's not the nature of our world!!! (we don't ask the world to stop to do something)

Jonas Boner

Concurrent ruby

  • Agent
  • Promise - Future
  • TimerTask

  • Thread Pools
class Btc
  attr_accessor :price, :runs, :task

  def initialize
    @price = 0
    @runs = 0
    @task =
      Concurrent::TimerTask.new(execution_interval: 5,
                                run_now: true,
                                dup_on_deref: true) do
      @price = update_price
      @runs += 1
    end

    @task.execute
  end

  def update_price
    uri = "https://api.bitcoinaverage.com/ticker/global/USD/"
    response = RestClient.get uri
    json = JSON.parse(response)
    json["ask"]
  end
end

btc = Btc.new

t = Thread.new do
  loop do
    puts 'BTC price: ' + "#{btc.price}".green
    puts 'BTC timer task runs: ' + "#{btc.runs}".green
    sleep 1
  end
end

t.join

Timer Task

score = Concurrent::Agent.new(10)
score.value #=> 10

score << proc{|current| current + 100 }
sleep(0.1)
score.value #=> 110

score << proc{|current| current * 2 }
sleep(0.1)
score.value #=> 220

score << proc{|current| current - 50 }
sleep(0.1)
score.value #=> 170

Agent

An agent is a single atomic value that represents an identity. The current value of the agent can be requested at any time (#deref). Each agent has a work queue and operates on the global thread pool.  A good example of an agent is a shared incrementing counter, such as the score in a video game.

  • Atoms on earth: 10^50
  • 2^K = 10^50 => K = 166  
  • uint32_t in C has 32 bits
  • 166 / 32 = 5.1875

6 integers in C have more possible states than the number of atoms on earth

Sequential code is difficult enough to test

Concurrent code, non deterministic code, is even more difficult to test since it might go through all the possible states

We need a concurrency model

"Every sufficiently complex application/language/tool will either have to use Lisp or reinvent it the hard way."

 

*Where highly concurrent systems are concerned, substitute Erlang for Lisp.

 

Don't communicate by sharing memory... Share memory by communicating

a concurrent object oriented programming framework which lets you build multithreaded programs out of concurrent objects as easily as you build sequential programs out of regular objects

based upon actor model

  • actors are isolated whitin processes
  • actors communicate via async messages
  • messages are buffered by a mailbox
  • actor works off each message

"I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages". 


Alan Kay, creator of Smalltalk, on the meaning of "object oriented programming"


class PrimeWorker
  include Celluloid

  def primes(limit)
    primes = []
    1.upto(limit) do |i|
      if i.prime?
        primes << i
      end
    end

    primes
  end
end

prime_pool = PrimeWorker.pool
puts prime_pool.primes 100

PrimeWorker

my own small http server

class HTTPServer
  include Celluloid

  def initialize(port = 8080)
    @server = TCPServer.new(port)

    supervisor = Celluloid::SupervisionGroup.run!
    supervisor.pool(AnswerWorker, as: :answer)

    loop {
      client = @server.accept

      begin
        Celluloid::Actor[:answer].start(client)
      rescue
      end

    }
  end
end
class AnswerWorker
  include Celluloid

  def start(client)
    headers = "HTTP/1.1 200 OK\r\n" +
              "Date: #{Time.now}\r\n" +
              "Content-Type: text\r\n\r\n"
    client.puts headers

    if rand(3) == 0
      client.puts "The system crashed"
      client.close
      throw("ATOMIC BUG")
    else
      client.puts "Hi..bye!"
      client.close
    end
  end
end

Answer Worker

Supervisor restar workers after crash

 

class MyGroup < Celluloid::SupervisionGroup
  supervise HTTPServer
end


MyGroup.run

TIME TO RUN IT!

2014 GOTO => MUTABILITY

and let give a big warm welcome to hamster

Hamster collections are immutable. Whenever you modify a Hamster collection, the original is preserved and a modified copy is returned. This makes them inherently thread-safe and shareable. At the same time, they remain CPU and memory-efficient by sharing between copies.

original = Hamster.list(1, 2, 3)
copy = original.cons(0)      # => Hamster.list(0, 1, 2, 3)

Notice how modifying a list actually returns a new list. That's because Hamster Lists are immutable. Thankfully, just like other Hamster collections, they're also very efficient at making copies!

Hamster.interval(10_000, 1_000_000).filter do |number|
  prime?(number)
end.take(3)
  # => 0.0009s

(10000..1000000).select do |number|
  prime?(number)
end.take(3)
  # => 10s

Ruby concurrency and parallelism

By Federico Carrone

Ruby concurrency and parallelism

  • 1,440