Distributed Computing in NodeJS

Who We are

  • Online By Design LLC
  • Jedediah Smith
  • Tom Kunzler

What We Are Doing

  • EZR Stocks
  • http://ezrstocks.com
  • Stock Analysis
  • Using Distributed Processing for back-end processing

Distribution Levels

  • Within Single Server
  • Between Servers
  • Between Data Centers / Locations
  • Between Planets
  • Between Galaxies

Possible Distribution Use Cases

  • Web server with heavy CPU load
    • Advanced business logic
    • Background processes
      • Ex. processing uploaded images
    • Lots of users with each needing
      a moderate amount of CPU time
  • Back-end data processing
    • Generating regular results using complex
      data algorithums

Node Uses Processes

  • New instance of V8
  • Does not share memory
  • >30ms startup time each
  • >10mb memory each
  • Not designed for thousands of instances
  • Don't create one for each connection
    (Unless you know your user base and
    have the resources to commit to it)
  • A common pattern is to create one process
    per core.

Within the Server

  • child_processes
  • cluster

child_processes

  • spawn
  • exec
  • fork

spawn

  • Launches an external command
  • Returns a stream
  • Lots of data (and/or)
  • Long running with regular data output

exec

  • Also launches an external command
  • Returns a buffer
  • Get everything when done
  • Limited amounts of data (default limit: 200K)

fork

  • Another NodeJS script
  • Allows event-based communication
  • var child = require('child_processes').fork('script');
  • From Server
    • child.send(message[, sendHandle])
    • child.on('message', function(message[, socket]) {})
  • From Child
    • process.send(message);
    • process.on('message', function(message) {})

cluster

  • Based around child_process.fork
  • Share port
  • cluster.isMaster
  • cluster.isWorker

 

 

 

 

Master

  • cluster.setupMaster([settings])
  • cluster.workers
    • worker.send(message[, sendHandle])
    • worker.kill([signal='SIGTERM'])
  • Events
    • fork
    • online
    • listening
    • disconnect
    • exit
    • setup

Worker

  • process.send(message)
  • worker.disconnect()
  • worker.isConnected()
  • Events
    • message
    • online
    • listening
    • disconnect
    • exit

Multi-process Web Server

Reasons to use

  • High Traffic
  • Mid to high CPU usage

 

Things to be aware of

  • Increased memory requirements
  • Again no shared state in memory

Example:

Multi-process Web Server

// Naive cluster example
var cluster = require('cluster');

// Get the number of CPUs/Cores on server
var numCPUs = require('os').cpus().length;

// When script is run initially it will be the master
if (cluster.isMaster) {
    // Start a new fork for each CPU
    for (var i = 0; i < numCPUs; i++) {
      cluster.fork();
    }

    cluster.on('exit', function(worker, code, signal) {
      console.log('worker ' + worker.process.pid + ' died');
    });
} else { // If script was called from cluster.fork() it is not the master
    
    // Run Server
    require("./app.js");
}

Between Servers

  • Allow security
  • Communicate between servers
    • Inside network
    • Between network
  • Be very robust
    • Handle network issues
      • Automatically reconnect
      • Deal with
        • partial packets
        • packet loss
        • ...

Build Your Own

  • Node has require('net');
  • Creates network connections
  • You have to handle everything on
    the previous slide

 

Or you can use something already out there

ØMQ(ZeroMQ)

  • Light Message Queue for communicating
    between machines letting you deal with data
  • Allows communicating between languages
  • Handles connectivity issues automatically
  • Designed with the idea of no shared state
  • One end needs to be solid
  • Many different communication methods
    • Pub/Sub
    • Req/Rep
    • Exclusive Pair
    • Router/Dealer
  • Can also be used intra server

Socket.IO

  • Client / Server architecture
  • Handles connectivity issues automatically
  • Allows communicating between node processes
    and/or the browser
  • Communication methods
    • Pub/Sub
    • Req/Rep

Our Work Queue

  • Designed for back-end data processing
  • Socket.IO
  • Event based
  • MongoDB backend
  • Allows Multiple Task Sources
  • Dependency management allowing tasks to depend on the completion of other tasks
  • Req/Rep
  • Allows adding new task code without restart
  • Existing scripts can have a wrapper added 
    to make them a task or task source

https://github.com/onlinebydesign/taskmaster

Master

Task

Task

Workers

Feeder

ØMQ

fork

Master (Model)

// Variables
var tasks = [];
var idleWorkers = [];

// Connect to database and get the list of tasks
var db = require('mongoose').connect('mongodb://localhost/taskrunner').connection;
var Task = require('../models/tasks');
db.once('open', function () {
    Task.find(function (err, docs) {
        tasks = docs;
    });
});

// Task Model Schema
new Schema({
    "created": Number,
    "priority": Number,
    "assigned": {
        "who": String,
        "when": Number,
        "completed": Number
    },
    "module": String,
    "params": [Schema.Types.Mixed],
    "dependencies": [Schema.Types.Mixed]
});

Master (Sockets)

    var io = require('socket.io')(config.port || 3232);

    // When a runner establishes/reestablishes connection
    io.on('connection', function (socket) {

        // When the runner requests a task
        socket.on('task:request', function () {
            // If a task is available then send to the worker. Mark the task as assigned.
            task.save(function (err) {
                socket.emit('task:send', JSON.stringify(task));
            });

            // If there are no tasks add worker to the idleWorkers array. 
        });

        // When a runner worker finishes a task
        socket.on('task:done', function (taskJSON) {
            // Update the task and save it to the database.
        });

        // When a runner wants to add tasks to the list
        socket.on('task:add', function (tasks) {
            // Add the tasks to the tasks object and the database then assign a task.
        });
    });
    

Worker Object

var Worker = function (options) {
    this.isIdle = true;
};

Worker.prototype.run = function (task) {
    this.isIdle = false;
    var taskScript = require('../' + taskFolder + '/' + task.module);
    taskScript.apply(this, task.params);
};

Worker.prototype.done = function () {
    this.isIdle = true;
    this.emit('task:done');
    this.emit('task:request');
};

Worker.prototype.add = function (newTask) {
    this.emit('task:add', newTask);
};

module.exports = Worker;

Worker Runner

var worker = new require('./worker')();
var socket = require('socket.io-client')('http://somemaster:3232');

/**
 * When the connection is established/re-established ask master for a new task per idleWorker.
 */
socket.on('connect', function () {
    Log.info('connected to master');
    if (worker.isIdle) {
        socket.emit('task:request');
    }
});

/**
 * When the connection is disconnected we log it
 */
socket.on('disconnect', function () {
    Log.warn('disconnected from master');
});

/**
 * When the master sends a task to the worker, create a new worker for the task
 */
socket.on('task:send', function (taskJSON) {
    var taskParsed = JSON.parse(taskJSON);
    worker.run(taskParsed);
});

Sample Tasks

module.exports = function (params) {
    setTimeout((function () {
        this.done('Done with import task!');
    }).bind(this), 10000);
};
var preBuiltTask = require('./preBuiltTask');

module.exports = function (taskOptions) {
    var ticker = taskOptions.ticker;
    var worker = this;

    preBuiltTask.run(ticker).then(function (message) {
        worker.done(message);
    }, function (err) {
        worker.error(err);
    });
};

Feeder

var ee = require("events").EventEmitter;
var util = require("util");

var Feeder = function (options) {
    ee.call(this); // Initialize the EventEmitter
    options.master = options.master || 'http://localhost:3000';
    this.socket = require('socket.io-client')(options.master);

    /**
     * When a runner wants to add tasks to the list
     */
    this.socket.on('task:added', function (tasks) {
        this.emit('task:added', tasks);
    }.bind(this));
};

util.inherits(Feeder, ee); // Inherit the ee.prototype into this

/**
 * Add a task or tasks.
 */
Feeder.prototype.add = function (newTasks) {
    this.socket.emit('task:add', newTasks);
};

module.exports = Feeder;

Feeder Usage

var utils = require('../generatorUtilities');
var moment = require('moment-timezone');
var async = require('async');


var feederOptions = {
    master: 'http://somemaster:3232'
};
var Feeder = require('./feeder');
var feeder = new Feeder(feederOptions);


var batch = tickerList.map(function (ticker) {
    return {module: 'sampleTask', priority: 25, params: [{ticker: ticker]};
});

feeder.add(batch);

Other Options For Single or Multiple Web Servers

  • PM2 - Node process manager to help with clustering
  • NGINX - Webserver with load balancer works
                   great with NodeJS
  • HAProxy - Open source proxy server - think load
                       balancer

Distributed Computing in NodeJS

By Jedediah Smith

Distributed Computing in NodeJS

NodeJS only runs on a single thread. For a non-trivial web server or for anything CPU intensive you will need more then the single thread. In this presentation we will cover clusters and child processes for gaining access to additional cores on the same server. We will then cover distributing work between multiple servers using a messaging queue such as ZeroMQ.

  • 6,402