Distributed Computing in NodeJS
Who We are
- Online By Design LLC
- Jedediah Smith
- Tom Kunzler
What We Are Doing
- EZR Stocks
- http://ezrstocks.com
- Stock Analysis
- Using Distributed Processing for back-end processing
Distribution Levels
- Within Single Server
- Between Servers
- Between Data Centers / Locations
- Between Planets
- Between Galaxies
Possible Distribution Use Cases
- Web server with heavy CPU load
- Advanced business logic
- Background processes
- Ex. processing uploaded images
- Lots of users with each needing
a moderate amount of CPU time
- Back-end data processing
- Generating regular results using complex
data algorithums
- Generating regular results using complex
Node Uses Processes
- New instance of V8
- Does not share memory
- >30ms startup time each
- >10mb memory each
- Not designed for thousands of instances
- Don't create one for each connection
(Unless you know your user base and
have the resources to commit to it) - A common pattern is to create one process
per core.
Within the Server
- child_processes
- cluster
child_processes
- spawn
- exec
- fork
spawn
- Launches an external command
- Returns a stream
- Lots of data (and/or)
- Long running with regular data output
exec
- Also launches an external command
- Returns a buffer
- Get everything when done
- Limited amounts of data (default limit: 200K)
fork
- Another NodeJS script
- Allows event-based communication
- var child = require('child_processes').fork('script');
- From Server
- child.send(message[, sendHandle])
- child.on('message', function(message[, socket]) {})
- From Child
- process.send(message);
- process.on('message', function(message) {})
cluster
- Based around child_process.fork
- Share port
- cluster.isMaster
- cluster.isWorker
Master
- cluster.setupMaster([settings])
- cluster.workers
- worker.send(message[, sendHandle])
- worker.kill([signal='SIGTERM'])
- Events
- fork
- online
- listening
- disconnect
- exit
- setup
Worker
- process.send(message)
- worker.disconnect()
- worker.isConnected()
- Events
- message
- online
- listening
- disconnect
- exit
Multi-process Web Server
Reasons to use
- High Traffic
- Mid to high CPU usage
Things to be aware of
- Increased memory requirements
- Again no shared state in memory
Example:
Multi-process Web Server
// Naive cluster example
var cluster = require('cluster');
// Get the number of CPUs/Cores on server
var numCPUs = require('os').cpus().length;
// When script is run initially it will be the master
if (cluster.isMaster) {
// Start a new fork for each CPU
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
});
} else { // If script was called from cluster.fork() it is not the master
// Run Server
require("./app.js");
}
Between Servers
- Allow security
- Communicate between servers
- Inside network
- Between network
- Be very robust
- Handle network issues
- Automatically reconnect
- Deal with
- partial packets
- packet loss
- ...
- Handle network issues
Build Your Own
- Node has require('net');
- Creates network connections
- You have to handle everything on
the previous slide
Or you can use something already out there
ØMQ(ZeroMQ)
- Light Message Queue for communicating
between machines letting you deal with data - Allows communicating between languages
- Handles connectivity issues automatically
- Designed with the idea of no shared state
- One end needs to be solid
- Many different communication methods
- Pub/Sub
- Req/Rep
- Exclusive Pair
- Router/Dealer
- Can also be used intra server
Socket.IO
- Client / Server architecture
- Handles connectivity issues automatically
- Allows communicating between node processes
and/or the browser - Communication methods
- Pub/Sub
- Req/Rep
Our Work Queue
- Designed for back-end data processing
- Socket.IO
- Event based
- MongoDB backend
- Allows Multiple Task Sources
- Dependency management allowing tasks to depend on the completion of other tasks
- Req/Rep
- Allows adding new task code without restart
- Existing scripts can have a wrapper added
to make them a task or task source
https://github.com/onlinebydesign/taskmaster
Master
Task
Task
Workers
Feeder
ØMQ
fork
Master (Model)
// Variables
var tasks = [];
var idleWorkers = [];
// Connect to database and get the list of tasks
var db = require('mongoose').connect('mongodb://localhost/taskrunner').connection;
var Task = require('../models/tasks');
db.once('open', function () {
Task.find(function (err, docs) {
tasks = docs;
});
});
// Task Model Schema
new Schema({
"created": Number,
"priority": Number,
"assigned": {
"who": String,
"when": Number,
"completed": Number
},
"module": String,
"params": [Schema.Types.Mixed],
"dependencies": [Schema.Types.Mixed]
});
Master (Sockets)
var io = require('socket.io')(config.port || 3232);
// When a runner establishes/reestablishes connection
io.on('connection', function (socket) {
// When the runner requests a task
socket.on('task:request', function () {
// If a task is available then send to the worker. Mark the task as assigned.
task.save(function (err) {
socket.emit('task:send', JSON.stringify(task));
});
// If there are no tasks add worker to the idleWorkers array.
});
// When a runner worker finishes a task
socket.on('task:done', function (taskJSON) {
// Update the task and save it to the database.
});
// When a runner wants to add tasks to the list
socket.on('task:add', function (tasks) {
// Add the tasks to the tasks object and the database then assign a task.
});
});
Worker Object
var Worker = function (options) {
this.isIdle = true;
};
Worker.prototype.run = function (task) {
this.isIdle = false;
var taskScript = require('../' + taskFolder + '/' + task.module);
taskScript.apply(this, task.params);
};
Worker.prototype.done = function () {
this.isIdle = true;
this.emit('task:done');
this.emit('task:request');
};
Worker.prototype.add = function (newTask) {
this.emit('task:add', newTask);
};
module.exports = Worker;
Worker Runner
var worker = new require('./worker')();
var socket = require('socket.io-client')('http://somemaster:3232');
/**
* When the connection is established/re-established ask master for a new task per idleWorker.
*/
socket.on('connect', function () {
Log.info('connected to master');
if (worker.isIdle) {
socket.emit('task:request');
}
});
/**
* When the connection is disconnected we log it
*/
socket.on('disconnect', function () {
Log.warn('disconnected from master');
});
/**
* When the master sends a task to the worker, create a new worker for the task
*/
socket.on('task:send', function (taskJSON) {
var taskParsed = JSON.parse(taskJSON);
worker.run(taskParsed);
});
Sample Tasks
module.exports = function (params) {
setTimeout((function () {
this.done('Done with import task!');
}).bind(this), 10000);
};
var preBuiltTask = require('./preBuiltTask');
module.exports = function (taskOptions) {
var ticker = taskOptions.ticker;
var worker = this;
preBuiltTask.run(ticker).then(function (message) {
worker.done(message);
}, function (err) {
worker.error(err);
});
};
Feeder
var ee = require("events").EventEmitter;
var util = require("util");
var Feeder = function (options) {
ee.call(this); // Initialize the EventEmitter
options.master = options.master || 'http://localhost:3000';
this.socket = require('socket.io-client')(options.master);
/**
* When a runner wants to add tasks to the list
*/
this.socket.on('task:added', function (tasks) {
this.emit('task:added', tasks);
}.bind(this));
};
util.inherits(Feeder, ee); // Inherit the ee.prototype into this
/**
* Add a task or tasks.
*/
Feeder.prototype.add = function (newTasks) {
this.socket.emit('task:add', newTasks);
};
module.exports = Feeder;
Feeder Usage
var utils = require('../generatorUtilities');
var moment = require('moment-timezone');
var async = require('async');
var feederOptions = {
master: 'http://somemaster:3232'
};
var Feeder = require('./feeder');
var feeder = new Feeder(feederOptions);
var batch = tickerList.map(function (ticker) {
return {module: 'sampleTask', priority: 25, params: [{ticker: ticker]};
});
feeder.add(batch);
Other Options For Single or Multiple Web Servers
- PM2 - Node process manager to help with clustering
- NGINX - Webserver with load balancer works
great with NodeJS - HAProxy - Open source proxy server - think load
balancer
Distributed Computing in NodeJS
By Jedediah Smith
Distributed Computing in NodeJS
NodeJS only runs on a single thread. For a non-trivial web server or for anything CPU intensive you will need more then the single thread. In this presentation we will cover clusters and child processes for gaining access to additional cores on the same server. We will then cover distributing work between multiple servers using a messaging queue such as ZeroMQ.
- 6,550