Callback Free Concurrency:

From Node to Elixir

Chris Geihsler

@seejee

 

  • Background
  • Web server architecture
  • Node drawbacks
  • Application comparison

Agenda

  • 3,000,000 students
  • 7,000,000 problems / day
  • 1,000,000,000 problems / year

+

Live Math Tutoring

  • 1 2X Heroku Dyno
  • 1 Redis DB
  • $50 / month
  • ~70 concurrent sessions
  • 2,000,000 sessions / year

+

Web Server Architecture

Joe Stanco, Packt Publishing

http://i.stack.imgur.com/BTm1H.png

Drawbacks

Callback Hell

Drawback:

var db       = require('db');
var Account  = require('models/account');
var AuditLog = require('models/audit_log');

function withdrawMoney(accountId, amount) {
  var trx = db.beginTransaction();

  try {
    var account = Account.find(accountId);

    account.withdraw(amount);
    AuditLog.createEntry("$" + amount + " withdrawn from account.");

    trx.commit();
  } catch (error) {
    trx.rollback();
    throw error;
  }
}

try {
  withdrawMoney(17, 50.00);
} catch (error) {
  console.log(error);
}
db.beginTransaction(function(err, trx) {
  if(err) {
    callback(err);
    return;
  }

  Account.find(accountId, function(err, account) {
    if(err) {
      trx.rollback(function() {
        callback(err);
      });
      return;
    }

    account.withdraw(amount, function(err) {
      if(err) {
        trx.rollback(function() {
          callback(err);
        });
        return;
      }

      AuditLog.createEntry("$" + amount +" withdrawn from account.", function(err) {
        if(err) {
          trx.rollback(function() {
            callback(err);
          });
        return;
        }

        trx.commit(function() {
          callback(err);
        });
      });
    });
  });
});
var trx = db.beginTransaction();

trx
  .then(function() {
    return Account.find(accountId);
  })
  .then(function(account) {
    return account.withdraw(amount);
  })
  .then(function() {
    return AuditLog.createEntry("$" + amount + " withdrawn from account.");
  })
  .then(trx.commit)
  .catch(trx.rollback);

Fault Tolerance

Drawback:

The safest way to respond to a thrown error is to shut down the process. 

Concurrency

Drawback:

var db = require('db');

var slowQueryHandler = function(req, res) {
  db.exec('SELECT * from really_big_table', function(err, results) {
    // 2 minutes later
    res.render('slow_query', results);
  });
};
var jsonEcho = function(req, res) {
  var params = JSON.parse(req.body);
  // what if req.body is really big?

  res.render('json_echo', params);
};

What do we really want?

Server Requirements

  • Concurrency (CPU and I/O)
  • Request Isolation
  • Durability
  • Developer Friendliness

+

  • Functional
  • Dynamic
  • Immutable
  • Pattern Matching
  • Efficient Concurrency
  • Hot code-reloading
  • Thread:                  8MB
  • Erlang process:     2KB

Memory overhead

  • Threads:                100-1,000s
  • Erlang processes: 10,000-100,000s

Simultaneous Connections

  • 10-100x faster to preempt an Erlang process.
  • Erlang processes are preempted on I/O and CPU.

Context Switching

  • OS threads share the heap.
  • Erlang processes share nothing.

Isolation

Server Requirements

  • Concurrency (CPU and I/O)
  • Request Isolation
  • Durability
  • Developer Friendliness

OTP

  • "Open Telecom Protocol"
  • "Let it fail" philosophy
  • Server building-blocks
  • Nine nines uptime

Server Requirements

  • Concurrency (CPU and I/O)
  • Request Isolation
  • Durability
  • Developer Friendliness
  • Ruby-ish syntax
  • Erlang semantics
  • Compiles to Erlang bytecode
  • Metaprogramming
  • Strong tools
  • Web framework written in Elixir
  • Ruby on Rails inspired
  • First-class PubSub, WebSocket support

Server Requirements

  • Concurrency (CPU and I/O)
  • Request Isolation
  • Durability
  • Developer Friendliness

Sample Application

(finally!)

Students

  • Wait in a queue
  • Private chat with a single teacher
  • Respond to every teacher message
  • Disconnect when teacher ends chat

Teachers

  • Chat with five students simultaneously
  • Pull students from the queue
  • End chat after receiving 50 messages

System

  • Know which users are connected
  • Know which chats are in-progress
  • Record the chat logs

Two* Implementations:

  • Node       0.10.33
  • Express   4.9.8
  • Socket.io 1.0
  • Erlang      17
  • Elixir        1.0.2
  • Phoenix   0.11

Model Layer

Teacher Roster

  • Which teachers are connected?
  • Which students are chatting with a teacher? 
function TeacherRoster() { this.teachers = {}; }

TeacherRoster.prototype.add = function(teacher, callback) {
  this.teachers[teacher.id] = teacher;
  callback(null, teacher);
};

TeacherRoster.prototype.find = function(id, callback) {
  var t = this.teachers[id];
  callback(null, t);
};

TeacherRoster.prototype.canAcceptMoreStudents = function(teacherId, callback) {
  this.find(teacherId, function(err, teacher) {
    var canAccept = teacher.students.length < 5;
    callback(null, canAccept);
  });
};

TeacherRoster.prototype.claimStudent = function(teacherId, studentId, callback) {
  this.find(teacherId, function(err, teacher) {
    teacher.students.push(studentId);
    callback(null, teacher);
  });
};

TeacherRoster.prototype.stats = function(callback) {
  var stats = { total: _.keys(@teachers).length };
  callback(null, stats);
};
defmodule ElixirChat.TeacherRoster do
  def new do
    HashDict.new
  end

  def add(roster, teacher) do
    teacher = Map.merge(teacher, %{student_ids: []})
    Dict.put(roster, teacher.id, teacher)
  end

  def find(roster, teacher_id) do
    roster[teacher_id]
  end

  def can_accept_more_students?(teacher) do
    length(teacher.student_ids) < 5
  end

  def claim_student(roster, teacher_id, student_id) do
    Dict.update!(roster, teacher_id, fn(t) ->
      %{t | student_ids: t.student_ids ++ [student_id]}
    end)
  end

  def stats(roster) do
    %{
      total: length(Dict.values(roster)),
    }
  end
end
iex(1)> alias ElixirChat.TeacherRoster, as: TeacherRoster
nil

iex(2)> roster = TeacherRoster.new
#HashDict<[]>

iex(3)> TeacherRoster.add(roster, %{id: 1})
#HashDict<[{1, %{id: 1, student_ids: []}}]>

iex(4)> roster
#HashDict<[]>

iex(5)> roster = TeacherRoster.add(roster, %{id: 1})
#HashDict<[{1, %{id: 1, student_ids: []}}]>

iex(6)> roster
#HashDict<[{1, %{id: 1, student_ids: []}}]>

iex(7)> roster = TeacherRoster.add(roster, %{id: 2})
#HashDict<[{2, %{id: 2, student_ids: []}}, {1, %{id: 1, student_ids: []}}]>

iex(8)> TeacherRoster.stats(roster)
%{total: 2}

iex(9)> TeacherRoster.new
         |> TeacherRoster.add(%{id: 10})
         |> TeacherRoster.add(%{id: 20})
         |> TeacherRoster.add(%{id: 30})
         |> TeacherRoster.stats
%{total: 3}

Where is the current state?

defmodule Actor do
  def start do
    spawn fn -> loop(initial_state) end
  end

  def initial_state do
    # TODO: set some initial state
  end

  def loop(state) do
    new_state = receive do
      # TODO: handle some messages
    end

    loop(new_state)
  end
end
defmodule Actor do
  def start do
    spawn fn -> loop(initial_state) end
  end

  def initial_state do
    0
  end

  def loop(state) do
    new_state = receive do
      {:add,      amount} -> state + amount
      {:subtract, amount} -> state - amount
      {:print           } -> IO.puts(state); state
    end

    loop(new_state)
  end
end
iex(1)> pid = Actor.start
#PID<0.167.0>

iex(2)> send pid, {:print}
0

iex(3)> send pid, {:add, 10}

iex(4)> send pid, {:add, 90}

iex(5)> send pid, {:print}
100

iex(6)> send pid, {:subtract, 50}

iex(7)> send pid, {:print}
50
defmodule Actor do
  # client process
  def add(pid, amount) do
    send pid, {:add, amount}
    pid
  end

  def subtract(pid, amount) do
    send pid, {:subtract, amount}
    pid
  end

  def print(pid) do
    send pid, {:print}
    pid
  end
  
  # server process
  def start do
    spawn fn -> loop(initial_state) end
  end

  def initial_state do
    0
  end

  def loop(state) do
    # Same as before
  end
end
  
  iex(1)> Actor.start
          |> Actor.add(10)
          |> Actor.add(90)
          |> Actor.subtract(20)
          |> Actor.print

  80
defmodule ElixirChat.TeacherRoster do
  def new do
  end

  def add(roster, teacher) do
  end

  def find(roster, teacher_id) do
  end

  def can_accept_more_students?(teacher) do
  end

  def claim_student(roster, teacher_id, student_id) do
  end

  def stats(roster) do
  end
end

OTP GenServer

  • Robust Actor implementation
  • Casts
    • Fire and forget
    • Client continues
  • Calls
    • Request/Response
    • Client blocks
defmodule ElixirChat.TeacherRosterServer do
  use GenServer
  alias ElixirChat.TeacherRoster, as: Roster

  # client process
  def start_link do
    GenServer.start_link(__MODULE__, nil, name: :teacher_roster_server)
  end

  def add(teacher) do
    GenServer.call(:teacher_roster_server, {:add, teacher})
  end

  def can_accept_more_students?(teacher_id) do
    GenServer.call(:teacher_roster_server, {:can_accept_more_students, teacher_id})
  end

  # server process
  def init(_) do
    {:ok, Roster.new}
  end

  def handle_call({:add, teacher}, _from, roster) do
    roster = Roster.add(roster, teacher)
    {:reply, teacher, roster}
  end

  def handle_call({:can_accept_more_students, teacher_id}, _from, roster) do
    teacher = Roster.find(roster, teacher_id)
    result  = Roster.can_accept_more_students?(roster, teacher)
    {:reply, result, roster}
  end
end
defmodule ElixirChat.TeacherRosterServer do
  use ExActor.GenServer, export: :teacher_roster_server
  alias ElixirChat.TeacherRoster, as: Roster

  defstart start_link do
    Roster.new |> initial_state
  end

  defcall add(teacher), state: roster do
    roster |> Roster.add(teacher) |> set_and_reply(teacher)
  end

  defcall can_accept_more_students?(teacher_id), state: roster  do
    teacher = Roster.find(roster, teacher_id)
    roster |> Roster.can_accept_more_students?(teacher) |> reply
  end
end

Student Roster

  • Which students are connected?
  • Who is the next student in the queue? 
  • Which students are chatting?
function StudentRoster() {}

// add, remove, stats, chatFinished omitted

StudentRoster.prototype.next = function(callback) {
  students = _.values(this.students)
  s = _.find(students, (s) -> s.status is 'waiting')
  callback(null, s);
};

StudentRoster.prototype.assignTo = function(studentId, teacherId, callback) {
  this.find(studentId, function(err, student) {
    student.status    = 'chatting'
    student.teacherId = teacherId
    callback(null, student);
  });
};
defmodule ElixirChat.StudentRoster do

  # new, add, remove, stats, chat_finished omitted

  def next_waiting(roster) do
    Dict.values(roster)
      |> Enum.filter(fn(s) -> s.status == "waiting" end)
      |> Enum.sort_by(fn(s) -> s.id end)
      |> Enum.at(0)
  end

  def assign_to(roster, teacher_id, student_id) do
    Dict.update!(roster, student_id, fn(s) ->
      %{s | teacher_id: teacher_id}
    end)
  end
end

Chat Log

  • Which chats are in progress?
  • What was said in each chat?

Chat Lifetime

  • Matches a teacher with the next student
  • Creates a new chat
  • Ends a chat
function ChatLifetime(teachers, students, chatLog) {
  this.teachers = teachers;
  this.students = students;
  this.chatLog  = chatLog;
}

ChatLifetime.prototype.createChatForNextStudent = function(teacherId, callback) {
  var _this = this;
  _this.teachers.canAcceptMoreStudents(teacherId, function(err, canAccept) {

    if(canAccept) {
      _this.students.next(function(err, student) {

        if(student) {
          _this.students.assignTo(student.id, teacherId, function(err) {
            _this.teachers.claimStudent(teacherId, student.id, function(err) {
              _this.chatLog.new(teacherId, student.id, function(err, chat) {
                callback(null, chat);
              });
            });
          });
        }
      });
    }
  });
};
defmodule ElixirChat.ChatLifetimeServer do
  use ExActor.GenServer, export: :chat_lifetime_server
  alias ElixirChat.ChatLogServer, as: Chats
  alias ElixirChat.TeacherRosterServer, as: Teachers
  alias ElixirChat.StudentRosterServer, as: Students

  defcall create_chat_for_next_student(teacher_id), state: _ do
    chat = nil

    if Teachers.can_accept_more_students?(teacher_id) do
      student_id = Students.next_student(teacher_id)

      if student_id do
        :ok  = Students.assign_student_to_teacher(student_id, teacher_id)
        :ok  = Teachers.claim_student(teacher_id, student_id)
        chat = Chats.new(teacher_id, student_id)
      end
    end

    reply(chat)
  end
end

WebSocket Layer

Presence Channel

  • Knows when users connect and disconnect
  • Knows when a teacher starts a new chat
  • Broadcasts student queue length
function PresenceChannel(faye) {
  // continued
  this.chatLifetime = new ChatLifetime(this.teachers, this.students, this.chatLog)
  this.chatChannel  = new ChatChannel(this.faye, this.chatLog, this.chatLifetime)
}

PresenceChannel.prototype.attach = function() {
  // continued
  this.faye.subscribe('/presence/claim_student', this.onClaimStudent.bind(this));
}

PresenceChannel.prototype.onClaimStudent = function(payload) {
  var _this = this;
  _this.chatLifetime.createChatForNextStudent(payload.teacherId, function(err, chat) {
    _this.chatChannel.attach(chat.id);
    _this.publishNewChat(chat);
  });
};

PresenceChannel.prototype.publishNewChat = function(chat) {
  var teacherChannel = "/presence/new_chat/teacher/" + chat.teacherId;
  var studentChannel = "/presence/new_chat/student/" + chat.studentId;

  this.faye.publish(teacherChannel, chat.teacherChannels);
  this.faye.publish(studentChannel, chat.studentChannels);
};
defmodule ElixirChat.PresenceChannel do
  use Phoenix.Channel
  alias ElixirChat.ChatLifetimeServer, as: Chats
  alias ElixirChat.TeacherRosterServer, as: Teachers
  alias ElixirChat.StudentRosterServer, as: Students

  def join(socket, topic, %{"userId" => id, "role" => "teacher"}) do
    Teachers.add(%{id: id})
    socket = assign(socket, :id, id)
    {:ok, socket}
  end

  def join(socket, topic, %{"userId" => id, "role" => "student"}) do
    Students.add(%{id: id})
    socket = assign(socket, :id, id)
    broadcast_status
    {:ok, socket}
  end

  def broadcast_status do
    data = %{
      teachers: Teachers.stats,
      students: Students.stats
    }

    broadcast "presence", "teachers", "user:status", data
  end
end
defmodule ElixirChat.PresenceChannel do
  def leave(socket, _message) do
    Students.remove(socket.assigns[:id])
    broadcast_status
    socket
  end

  def event(socket, "claim:student", %{"teacherId" => teacher_id}) do
    chat = Chats.create_chat_for_next_student(teacher_id)

    if chat do
      reply     socket,     "new:chat:#{chat.teacher_id}", chat
      broadcast "presence", "student:#{chat.student_id}", "new:chat", chat
    end

    socket
  end
end

Chat Channel

  • Private channel for a single chat
  • Relays messages between student and teacher
  • Terminates a chat

Student Client

  • Connect
  • Wait for a teacher to start a chat
  • Reply to every teacher message
  • Disconnect when teacher ends chat
var client = new faye.Client(url);

client.subscribe('/presence/new_chat/student/' + id, function(chat) {
  var messageCount = 0;

  client.subscribe(chat.receiveChannel, function(data) {
    messageCount++;

    client.publish(chat.sendChannel, {
      message: 'Message #' + messageCount + ' from student ' + id
    });
  });

  client.subscribe(chat.terminatedChannel, function(data) {
    client.publish('/presence/student/disconnect', {
      userId: id,
      role:   'student'
    });

    client.disconnect();
  });

  client.publish(chat.joinedChannel, { userId: id });
});

client.publish('/presence/student/connect', {
  userId: id,
  role:   'student'
});
var socket  = new Phoenix.Socket(url);
var student = {userId: id, role: 'student'};

socket.join("presence", "student:" + id, student, function(channel) {
  channel.on("new:chat", function(chat) {
    socket.join("chats", chat.id, student, function(chatChan) {
      var messageCount = 0;

      chatChan.on("chat:terminated", function(data) {
        channel.leave();
        socket.close();
      });

      chatChan.on("student:receive", function(data) {
        messageCount++;

        chatChan.send("student:send", {
          message: "Message #" + messageCount + " from student: " + id
        });
      });

      chatChan.send("student:joined", {});
    });
  });

  channel.send("student:ready", {userId: id});
});

Teacher Client

  • Connect
  • Start a chat with a student (up to 5)
  • Reply to every student message
  • End chat after 50 messages

Review

  • Concurrency without callbacks
  • Synchronization without locks
  • Scalability across cores or machines

Fault Tolerance

The safest way to respond to a thrown error is to shut down the process

The safest way to respond to a thrown error is to shut down the process

+

OTP Supervisor

 

  • Restarts processes when they die
  • Several restart strategies
defmodule ElixirChat.ModelSupervisor do
  use Supervisor

  def start_link do
    Supervisor.start_link(__MODULE__, [])
  end

  def init([]) do
    children = [
      worker(ElixirChat.ChatLifetimeServer, []),
      worker(ElixirChat.ChatLogServer, []),
      worker(ElixirChat.TeacherRosterServer, []),
      worker(ElixirChat.StudentRosterServer, []),
    ]

    supervise(children, strategy: :one_for_one)
  end
end

Performance

  • 1000 students
  • 10 teachers
  • ~225,000 messages

Macbook Pro, 1 Core (Jan 2014)

  • Node:  42s
  • Elixir:   54s

Macbook Pro, 1 Core (April 2014)

  • Node:  41s
  • Elixir:   29s

Macbook Pro, 8 cores

  • Node:  41s
  • Elixir:   24s

Macbook Pro, 8 cores, split clients

  • Node:  41s
  • Elixir:   14s

Scaling

Scaling

(in January)

80 hours later...

  • Storing application state in Redis
  • Using sticky Load Balancer Sessions
  • Manually routing chats to different processes
  • Fighting race conditions

After...

Load Balancer

Presence

Chat 1

Chat 2

Chat 8

...

Redis

Scalable 

Scalable node, 8 cores, split clients

  • Node:  25s
  • Elixir:   14s

to

  • Rewriting internal tools in Elixir
  • Replacing Node PubSub with Elixir/Phoenix

+

Thank you!

Chris Geihsler

@seejee

 

elixir-v-node

By Chris Geihsler

elixir-v-node

  • 3,368