Detangling multithreading in Node.js
Multi what?
Multithreading: A technique by which a single set of code can be used by several processors at different stages of execution.
Isn't Node.js a multithreaded runtime already?
Node.js is a single-threaded JavaScript runtime which in the background uses multiple threads to execute asynchronous code.
JS code and C++ code
Node.js has two types of threads: one Event Loop and k Workers. The Event Loop is responsible for JavaScript callbacks and non-blocking I/O, and a Worker executes tasks corresponding to C++ code that completes an asynchronous request, including blocking I/O and CPU-intensive work.
The world-famous event loop
Then, why should I care?
Few operations running in the main process that could block the event loop; then they degrade the performance of the Node.js application
Blocking operations
Computational complexity
A.K.A.: How well a particular algorithm or data structure could scale.
Some Node.js examples
- Complex calculations
- REDOS: "vulnerable" regular expression.
- JSON DOS: JSON.stringify a big String
- Sync methods from the core APIs: crypto, zlib, fs and child_process
Some code
// REDOS Example
const r = /([a-z]+)+$/
const s = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaa!'
console.log('Running regular expression... please wait')
console.time('benchmark')
r.test(s)
console.timeEnd('benchmark')
// JSON DOS
let obj = { a: 1 }
const niter = 20
let before, res, took
for (let i = 0; i < niter; i++) {
obj = { obj1: obj, obj2: obj } // Doubles in size each iter
}
before = process.hrtime()
res = JSON.stringify(obj)
took = process.hrtime(before)
console.log('JSON.stringify took ' + took)
Alternatives to avoid multithreading in Node.js
Splitting the load
Split operations in different steps using asynchronous methods (like setImmediate) so that each runs on the Event Loop but regularly allow for processing other events.
Go native
Create a native C++ addon and move the complex operations out of the main thread. not suitable for many cases.
Multithreading your operations
Main issues to solve when multithreading
- Shared data and resources
- Communication and synchronization
- Error handling
Using child_process
Launch any type of process and provides an event-emitter interface for it:
- spawn: launches a shell, executes a command and streams back the output in real-time
- exec: launches a shell, executes a command and waits for it to finish returning a buffer with its output
- execFile: does not spawn a shell by default, executes a file and wait for it to finish returning a buffer with its output
- fork: variation of spawn where both the parent and the child process can communicate with each other
Using cluster
Cluster module allows us to create worker processes where all share server ports easily. Also, it magically creates an I.P.C. channel to communicate the master and worker process, passing JavaScript objects.
* Warning: don't create a cluster for your entire server, is preferred to have an external cluster manager and a load balancer for production environments due to performance and debuggability.
Using worker_threads
Workers (threads) are useful for performing CPU-intensive JavaScript operations. Unlike child_process or cluster, worker_threads can share memory. They do so by transferring ArrayBuffer instances or sharing SharedArrayBuffer instances.
* Warning: Available from Node.js 10, but are still in the experimental phase.
Modules doing the heavy-lifting
Let's check some code
Go ahead an learn
Resources
- https://nodejs.org/es/docs/guides/event-loop-timers-and-nexttick/
- https://hackernoon.com/the-simplicity-of-computational-complexity-street-fighter-ii-vs-the-big-o-67786db7653c
- https://nodejs.org/es/docs/guides/dont-block-the-event-loop/
- https://itnext.io/multi-threading-and-multi-process-in-node-js-ffa5bb5cde98
- https://medium.com/tech-tajawal/clustering-in-nodejs-utilizing-multiple-processor-cores-75d78aeb0f4f
- https://www.youtube.com/watch?v=oIjBlpf_UaA
Thanks
Detangling Multithreading in Node.js
By Adrián Estrada
Detangling Multithreading in Node.js
- 1,334