Detangling multithreading in Node.js

Multi what?

 

Multithreading: A technique by which a single set of code can be used by several processors at different stages of execution.

Isn't Node.js a multithreaded runtime already?

Node.js is a single-threaded JavaScript runtime which in the background uses multiple threads to execute asynchronous code.

JS code and C++ code

Node.js has two types of threads: one Event Loop and k Workers. The Event Loop is responsible for JavaScript callbacks and non-blocking I/O, and a Worker executes tasks corresponding to C++ code that completes an asynchronous request, including blocking I/O and CPU-intensive work.

The world-famous event loop

Then, why should I care?

Few operations running in the main process that could block the event loop; then they degrade the performance of the Node.js application

Blocking operations

Computational complexity

A.K.A.: How well a particular algorithm or data structure could scale.

Some Node.js examples

  • Complex calculations
  • REDOS: "vulnerable" regular expression.
  • JSON DOS: JSON.stringify a big String
  • Sync methods from the core APIs: crypto, zlib, fs and child_process

Some code

// REDOS Example
const r = /([a-z]+)+$/
const s = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaa!'

console.log('Running regular expression... please wait')
console.time('benchmark')
r.test(s)
console.timeEnd('benchmark')


// JSON DOS
let obj = { a: 1 }
const niter = 20

let before, res, took

for (let i = 0; i < niter; i++) {
  obj = { obj1: obj, obj2: obj } // Doubles in size each iter
}

before = process.hrtime()
res = JSON.stringify(obj)
took = process.hrtime(before)
console.log('JSON.stringify took ' + took)

Alternatives to avoid multithreading in Node.js

Splitting the load

Split operations in different steps using asynchronous methods (like setImmediate) so that each runs on the Event Loop but regularly allow for processing other events.

Go native

 

Create a native C++ addon and move the complex operations out of the main thread. not suitable for many cases.

 

https://github.com/workshopper/goingnative

Multithreading your operations

Main issues to solve when multithreading

  • Shared data and resources
  • Communication and synchronization
  • Error handling

Using child_process

Launch any type of process and provides an event-emitter interface for it:

  • spawn: launches a shell, executes a command and streams back the output in real-time
  • exec: launches a shell, executes a command and waits for it to finish returning a buffer with its output
  • execFile: does not spawn a shell by default, executes a file and wait for it to finish returning a buffer with its output
  • fork: variation of spawn where both the parent and the child process can communicate with each other

Using cluster

Cluster module allows us to create worker processes where all share server ports easily. Also, it magically creates an I.P.C. channel to communicate the master and worker process, passing JavaScript objects.

 

* Warning: don't create a cluster for your entire server, is preferred to have an external cluster manager and a load balancer for production environments due to performance and debuggability.

 

Using worker_threads

Workers (threads) are useful for performing CPU-intensive JavaScript operations. Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.

 

* Warning: Available from Node.js 10, but are still in the experimental phase.

Modules doing the heavy-lifting

Let's check some code

Go ahead an learn

Resources

Thanks

Made with Slides.com