Distributed Computing in NodeJS

Who We are

  • Online By Design LLC
  • Jedediah Smith
  • Tom Kunzler

What We Are Doing

  • EZR Stocks
  • http://ezrstocks.com
  • Stock Analysis
  • Using Distributed Processing for back-end processing

Distribution Levels

  • Within Single Server
  • Between Servers
  • Between Data Centers / Locations
  • Between Planets
  • Between Galaxies

Possible Distribution Use Cases

  • Web server with heavy CPU load
    • Advanced business logic
    • Background processes
      • Ex. processing uploaded images
    • Lots of users with each needing
      a moderate amount of CPU time
  • Back-end data processing
    • Generating regular results using complex
      data algorithums

Node Uses Processes

  • New instance of V8
  • Does not share memory
  • >30ms startup time each
  • >10mb memory each
  • Not designed for thousands of instances
  • Don't create one for each connection
    (Unless you know your user base and
    have the resources to commit to it)
  • A common pattern is to create one process
    per core.

Within the Server

  • child_processes
  • cluster

child_processes

  • spawn
  • exec
  • fork

spawn

  • Launches an external command
  • Returns a stream
  • Lots of data (and/or)
  • Long running with regular data output

exec

  • Also launches an external command
  • Returns a buffer
  • Get everything when done
  • Limited amounts of data (default limit: 200K)

fork

  • Another NodeJS script
  • Allows event-based communication
  • var child = require('child_processes').fork('script');
  • From Server
    • child.send(message[, sendHandle])
    • child.on('message', function(message[, socket]) {})
  • From Child
    • process.send(message);
    • process.on('message', function(message) {})

cluster

  • Based around child_process.fork
  • Share port
  • cluster.isMaster
  • cluster.isWorker

 

 

 

 

Master

  • cluster.setupMaster([settings])
  • cluster.workers
    • worker.send(message[, sendHandle])
    • worker.kill([signal='SIGTERM'])
  • Events
    • fork
    • online
    • listening
    • disconnect
    • exit
    • setup

Worker

  • process.send(message)
  • worker.disconnect()
  • worker.isConnected()
  • Events
    • message
    • online
    • listening
    • disconnect
    • exit

Multi-process Web Server

Reasons to use

  • High Traffic
  • Mid to high CPU usage

 

Things to be aware of

  • Increased memory requirements
  • Again no shared state in memory

Example:

Multi-process Web Server

// Naive cluster example
var cluster = require('cluster');

// Get the number of CPUs/Cores on server
var numCPUs = require('os').cpus().length;

// When script is run initially it will be the master
if (cluster.isMaster) {
    // Start a new fork for each CPU
    for (var i = 0; i < numCPUs; i++) {
      cluster.fork();
    }

    cluster.on('exit', function(worker, code, signal) {
      console.log('worker ' + worker.process.pid + ' died');
    });
} else { // If script was called from cluster.fork() it is not the master
    
    // Run Server
    require("./app.js");
}

Between Servers

  • Allow security
  • Communicate between servers
    • Inside network
    • Between network
  • Be very robust
    • Handle network issues
      • Automatically reconnect
      • Deal with
        • partial packets
        • packet loss
        • ...

Build Your Own

  • Node has require('net');
  • Creates network connections
  • You have to handle everything on
    the previous slide

 

Or you can use something already out there

ØMQ(ZeroMQ)

  • Light Message Queue for communicating
    between machines letting you deal with data
  • Allows communicating between languages
  • Handles connectivity issues automatically
  • Designed with the idea of no shared state
  • One end needs to be solid
  • Many different communication methods
    • Pub/Sub
    • Req/Rep
    • Exclusive Pair
    • Router/Dealer
  • Can also be used intra server

Socket.IO

  • Client / Server architecture
  • Handles connectivity issues automatically
  • Allows communicating between node processes
    and/or the browser
  • Communication methods
    • Pub/Sub
    • Req/Rep

Our Work Queue

  • Designed for back-end data processing
  • Socket.IO
  • Event based
  • MongoDB backend
  • Allows Multiple Task Sources
  • Dependency management allowing tasks to depend on the completion of other tasks
  • Req/Rep
  • Allows adding new task code without restart
  • Existing scripts can have a wrapper added 
    to make them a task or task source

https://github.com/onlinebydesign/taskmaster

Master

Task

Task

Workers

Feeder

ØMQ

fork

Master (Model)

// Variables
var tasks = [];
var idleWorkers = [];

// Connect to database and get the list of tasks
var db = require('mongoose').connect('mongodb://localhost/taskrunner').connection;
var Task = require('../models/tasks');
db.once('open', function () {
    Task.find(function (err, docs) {
        tasks = docs;
    });
});

// Task Model Schema
new Schema({
    "created": Number,
    "priority": Number,
    "assigned": {
        "who": String,
        "when": Number,
        "completed": Number
    },
    "module": String,
    "params": [Schema.Types.Mixed],
    "dependencies": [Schema.Types.Mixed]
});

Master (Sockets)

    var io = require('socket.io')(config.port || 3232);

    // When a runner establishes/reestablishes connection
    io.on('connection', function (socket) {

        // When the runner requests a task
        socket.on('task:request', function () {
            // If a task is available then send to the worker. Mark the task as assigned.
            task.save(function (err) {
                socket.emit('task:send', JSON.stringify(task));
            });

            // If there are no tasks add worker to the idleWorkers array. 
        });

        // When a runner worker finishes a task
        socket.on('task:done', function (taskJSON) {
            // Update the task and save it to the database.
        });

        // When a runner wants to add tasks to the list
        socket.on('task:add', function (tasks) {
            // Add the tasks to the tasks object and the database then assign a task.
        });
    });
    

Worker Object

var Worker = function (options) {
    this.isIdle = true;
};

Worker.prototype.run = function (task) {
    this.isIdle = false;
    var taskScript = require('../' + taskFolder + '/' + task.module);
    taskScript.apply(this, task.params);
};

Worker.prototype.done = function () {
    this.isIdle = true;
    this.emit('task:done');
    this.emit('task:request');
};

Worker.prototype.add = function (newTask) {
    this.emit('task:add', newTask);
};

module.exports = Worker;

Worker Runner

var worker = new require('./worker')();
var socket = require('socket.io-client')('http://somemaster:3232');

/**
 * When the connection is established/re-established ask master for a new task per idleWorker.
 */
socket.on('connect', function () {
    Log.info('connected to master');
    if (worker.isIdle) {
        socket.emit('task:request');
    }
});

/**
 * When the connection is disconnected we log it
 */
socket.on('disconnect', function () {
    Log.warn('disconnected from master');
});

/**
 * When the master sends a task to the worker, create a new worker for the task
 */
socket.on('task:send', function (taskJSON) {
    var taskParsed = JSON.parse(taskJSON);
    worker.run(taskParsed);
});

Sample Tasks

module.exports = function (params) {
    setTimeout((function () {
        this.done('Done with import task!');
    }).bind(this), 10000);
};
var preBuiltTask = require('./preBuiltTask');

module.exports = function (taskOptions) {
    var ticker = taskOptions.ticker;
    var worker = this;

    preBuiltTask.run(ticker).then(function (message) {
        worker.done(message);
    }, function (err) {
        worker.error(err);
    });
};

Feeder

var ee = require("events").EventEmitter;
var util = require("util");

var Feeder = function (options) {
    ee.call(this); // Initialize the EventEmitter
    options.master = options.master || 'http://localhost:3000';
    this.socket = require('socket.io-client')(options.master);

    /**
     * When a runner wants to add tasks to the list
     */
    this.socket.on('task:added', function (tasks) {
        this.emit('task:added', tasks);
    }.bind(this));
};

util.inherits(Feeder, ee); // Inherit the ee.prototype into this

/**
 * Add a task or tasks.
 */
Feeder.prototype.add = function (newTasks) {
    this.socket.emit('task:add', newTasks);
};

module.exports = Feeder;

Feeder Usage

var utils = require('../generatorUtilities');
var moment = require('moment-timezone');
var async = require('async');


var feederOptions = {
    master: 'http://somemaster:3232'
};
var Feeder = require('./feeder');
var feeder = new Feeder(feederOptions);


var batch = tickerList.map(function (ticker) {
    return {module: 'sampleTask', priority: 25, params: [{ticker: ticker]};
});

feeder.add(batch);

Other Options For Single or Multiple Web Servers

  • PM2 - Node process manager to help with clustering
  • NGINX - Webserver with load balancer works
                   great with NodeJS
  • HAProxy - Open source proxy server - think load
                       balancer
Made with Slides.com