It’s about time to embrace Node.js Streams

Luciano Mammino (@loige)

October 2nd, 2019

// buffer-copy.js

const {
  readFileSync,
  writeFileSync
} = require('fs')

const [,, src, dest] = process.argv

// read entire file content
const content = readFileSync(src)

// write that content somewhere else
writeFileSync(dest, content)

@loige

We do this all the time

and it's ok

but sometimes ...

@loige

💥 ERR_FS_FILE_TOO_LARGE! 💥

File size is greater than possible Buffer

But why?

@loige

if bytes were blocks...

@loige

Mario can lift

few blocks

@loige

but not too many...

@loige

What can we do if we have to move many blocks?

@loige

We CAN move them one by one!

@loige

we stream them...

👋 Hello, i am Luciano!

🇮🇹

🇮🇪

🇺🇸

Cloud Architect

Blog: loige.co

Twitter: @loige

GitHub: @lmammino

code: loige.link/streams-examples

loige.link/streams-manc

01. Buffers VS
Streams

@loige

buffer: data structure to store and transfer arbitrary binary data

@loige

*Note that this is loading all the content of the file in memory

Stream: Abstract interface for working with streaming data

@loige

*It does not load all the data straight away

File copy: The buffer way

@loige

// buffer-copy.js

const {
  readFileSync,
  writeFileSync
} = require('fs')

const [,, src, dest] = process.argv
const content = readFileSync(src)
writeFileSync(dest, content)

FILE COPY: The Stream way

// stream-copy.js

const { 
  createReadStream,
  createWriteStream
} = require('fs')

const [,, src, dest] = process.argv
const srcStream = createReadStream(src)
const destStream = createWriteStream(dest)
srcStream.on('data', (data) => destStream.write(data))

@loige

* Careful: this implementation is not optimal

Memory comparison (~600Mb file)

node --inspect-brk buffer-copy.js assets/poster.psd ~/Downloads/poster.psd

@loige

Memory comparison (~600Mb file)

node --inspect-brk stream-copy.js assets/poster.psd ~/Downloads/poster.psd

@loige

let's try with a big file (~10Gb)

@loige

let's try with a big file (~10Gb)

node --inspect-brk stream-copy.js assets/the-matrix-hd.mkv ~/Downloads/the-matrix-hd.mkv

@loige

👍 streams vs buffers 👎

Streams keep a low memory footprint even with large amounts of data
Streams allow you to process data as soon as it arrives

@loige

03. Stream types
& APIs

@loige

All Streams are event emitters

A stream instance is an object that emits events when its internal state changes, for instance:

s.on('readable', () => {}) // ready to be consumed
s.on('data', (chunk) => {}) // new data is available
s.on('error', (err) => {}) // some error happened
s.on('end', () => {}) // no more data available

The events available depend from the type of stream

@loige

Readable streams

A readable stream represents a source from which data is consumed.

Examples:

fs readStream
process.stdin
HTTP response (client-side)
HTTP request (server-side)
AWS S3 GetObject (data field)

It supports two modes for data consumption: flowing and paused (or non-flowing) mode.

@loige

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

Read

data listener

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

data

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

Read

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

data

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

Read

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

data

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

Read

data listener

(end)

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

@loige

Source data

Readable stream in flowing mode

data listener

end

(end)

When no more data is available, end is emitted.

Readable streams

Data is read from source automatically and chunks are emitted as soon as they are available.

// count-emojis-flowing.js

const { createReadStream } = require('fs')
const { EMOJI_MAP } = require('emoji') // from npm

const emojis = Object.keys(EMOJI_MAP)

const file = createReadStream(process.argv[2])
let counter = 0

file.on('data', chunk => {
  for (let char of chunk.toString('utf8')) {
    if (emojis.includes(char)) {
      counter++
    }
  }
})
file.on('end', () => console.log(`Found ${counter} emojis`))
file.on('error', err => console.error(`Error reading file: ${err}`))

@loige

loige.link/up-emojiart

@loige

Readable streams are also Async Iterators
(Node.js 10+)

@loige

// count-emojis-async-iterator.js
const { createReadStream } = require('fs')
const { EMOJI_MAP } = require('emoji') // from npm

async function main () {
  const emojis = Object.keys(EMOJI_MAP)
  const file = createReadStream(process.argv[2])
  let counter = 0

  for await (let chunk of file) {
    for (let char of chunk.toString('utf8')) {
      if (emojis.includes(char)) {
        counter++
      }
    }
  }

  console.log(`Found ${counter} emojis`)
}

main()

@loige

Writable streams

A writable stream is an abstraction that allows you to write data to a destination

Examples:

fs writeStream
process.stdout, process.stderr
HTTP request (client-side)
HTTP response (server-side)
AWS S3 PutObject (body parameter)

@loige

// writable-http-request.js
const http = require('http')

const req = http.request(
  {
    hostname: 'enx6b07hdu6cs.x.pipedream.net',
    method: 'POST'
  },
  resp => {
    console.log(`Server responded with "${resp.statusCode}"`)
  }
)

req.on('finish', () => console.log('request sent'))
req.on('close', () => console.log('Connection closed'))
req.on('error', err => console.error(`Request failed: ${err}`))

req.write('writing some content...\n')
req.end('last write & close the stream')

@loige

@loige

loige.link/writable-http-req

@loige

backpressure

When writing large amounts of data you should make sure you handle the stop write signal and the drain event

loige.link/backpressure

@loige

// stream-copy-safe.js

const { createReadStream, createWriteStream } = require('fs')

const [, , src, dest] = process.argv
const srcStream = createReadStream(src)
const destStream = createWriteStream(dest)

srcStream.on('data', data => {
  const canContinue = destStream.write(data)
  if (!canContinue) {
    // we are overflowing the destination, we should pause
    srcStream.pause()
    // we will resume when the destination stream is drained
    destStream.once('drain', () => srcStream.resume())
  }
})

@loige

Other types of stream

Duplex Stream
streams that are both Readable and Writable.
(net.Socket)
Transform Stream
Duplex streams that can modify or transform the data as it is written and read.
(zlib.createGzip(), crypto.createCipheriv())

@loige

Anatomy of a transform stream

1. write data

transform stream

3. read transformed data

2. transform the data

(readable stream)

(writable stream)

@loige

Gzip example

1. write data

transform stream

3. read transformed data

2. transform the data

(readable stream)

(writable stream)

@loige

Uncompressed data

Compressed data

compress

zlib.createGzip()

How can we use transform streams?

Readable

Transform

Writable

⚡️ data

write()

⚡️ data

write()

pause()

⚡️ drain

resume()

pause()

⚡️ drain

resume()

@loige

(Backpressure)

You also have to handle end & error events!

@loige

// stream-copy-gzip.js
const { 
  createReadStream,
  createWriteStream
} = require('fs')
const { createGzip } = require('zlib')

const [, , src, dest] = process.argv
const srcStream = createReadStream(src)
const gzipStream = createGzip()
const destStream = createWriteStream(dest)

srcStream.on('data', data => {
  const canContinue = gzipStream.write(data)
  if (!canContinue) {
    srcStream.pause()
    gzipStream.once('drain', () => {
      srcStream.resume()
    })
  }
})

srcStream.on('end', () => {
  // check if there's buffered data left
  const remainingData = gzipStream.read()
  if (remainingData !== null) {
    destStream.write()
  }
  gzipStream.end()
})


gzipStream.on('data', data => {
  const canContinue = destStream.write(data)
  if (!canContinue) {
    gzipStream.pause()
    destStream.once('drain', () => {
      gzipStream.resume()
    })
  }
})

gzipStream.on('end', () => {
  destStream.end()
})

// ⚠️ TODO: handle errors!

03. pipe()

@loige

readable.pipe(writableDest)

@loige

Connects a readable stream to a writable stream
A transform stream can be used as a destination as well
It returns the destination stream allowing for a chain of pipes

readable
  .pipe(tranform1)
  .pipe(transform2)
  .pipe(transform3)
  .pipe(writable)

// stream-copy-gzip-pipe.js

const { 
  createReadStream,
  createWriteStream
} = require('fs')
const { createGzip } = require('zlib')

const [, , src, dest] = process.argv
const srcStream = createReadStream(src)
const gzipStream = createGzip()
const destStream = createWriteStream(dest)

srcStream
  .pipe(gzipStream)
  .pipe(destStream)

@loige

Setup complex pipelines with pipe

@loige

readable
  .pipe(decompress)
  .pipe(decrypt)
  .pipe(convert)
  .pipe(encrypt)
  .pipe(compress)
  .pipe(writeToDisk)

This is the most common way to use streams

Handling errors (correctly)

@loige

readable
  .on('error', handleErr)
  .pipe(decompress)
  .on('error', handleErr)
  .pipe(decrypt)
  .on('error', handleErr)
  .pipe(convert)
  .on('error', handleErr)
  .pipe(encrypt)
  .on('error', handleErr)
  .pipe(compress)
  .on('error', handleErr)
  .pipe(writeToDisk)
  .on('error', handleErr)

handleErr should end and destroy the streams

(it doesn't happen automatically)

04. Stream utilities

@loige

stream.pipeline(...streams, callback) - Node.js 10+

@loige

// stream-copy-gzip-pipeline.js

const { pipeline } = require('stream')
const { createReadStream, createWriteStream } = require('fs')
const { createGzip } = require('zlib')

const [, , src, dest] = process.argv

pipeline(
  createReadStream(src),
  createGzip(),
  createWriteStream(dest),
  function onEnd (err) {
    if (err) {
      console.error(`Error: ${err}`)
      process.exit(1)
    }

    console.log('Done!')
  }
)

You can pass multiple streams (they will be piped)

The last argument is a callback. If invoked with an error, it means the pipeline failed at some point.

All the streams are ended and destroyed correctly.

For Node.js < 10: pump - npm.im/pump

@loige

// stream-copy-gzip-pump.js

const pump = require('pump') // from npm
const { createReadStream, createWriteStream } = require('fs')
const { createGzip } = require('zlib')

const [, , src, dest] = process.argv

pump( // just swap pipeline with pump!
  createReadStream(src),
  createGzip(),
  createWriteStream(dest),
  function onEnd (err) {
    if (err) {
      console.error(`Error: ${err}`)
      process.exit(1)
    }

    console.log('Done!')
  }
)

pumpify(...streams) - npm.im/pumpify

Create reusable pieces of pipeline

@loige

Let's create EncGz, an application that helps us to read and write encrypted-gzipped files

// encgz-stream.js - utility library

const {
  createCipheriv,
  createDecipheriv,
  randomBytes,
  createHash
} = require('crypto')
const { createGzip, createGunzip } = require('zlib')
const pumpify = require('pumpify') // from npm

// calculates md5 of the secret (trimmed)
function getChiperKey (secret) {}

function createEncgz (secret) {
  const initVect = randomBytes(16)
  const cipherKey = getChiperKey(secret)
  const encryptStream = createCipheriv('aes256', cipherKey, initVect)
  const gzipStream = createGzip()

  const stream = pumpify(encryptStream, gzipStream)
  stream.initVect = initVect

  return stream
}

@loige

// encgz-stream.js (...continue from previous slide)

function createDecgz (secret, initVect) {
  const cipherKey = getChiperKey(secret)
  const decryptStream = createDecipheriv('aes256', cipherKey, initVect)
  const gunzipStream = createGunzip()

  const stream = pumpify(gunzipStream, decryptStream)
  return stream
}

module.exports = {
  createEncgz,
  createDecgz
}

@loige

// encgz.js - CLI to encrypt and gzip (from stdin to stdout)

const { pipeline } = require('stream')
const { createEncgz } = require('./encgz-stream')

const [, , secret] = process.argv

const encgz = createEncgz(secret)
console.error(`init vector: ${encgz.initVect.toString('hex')}`)

pipeline(
  process.stdin,
  encgz,
  process.stdout,
  function onEnd (err) {
    if (err) {
      console.error(`Error: ${err}`)
      process.exit(1)
    }
  }
)

@loige

// decgz.js - CLI to gunzip and decrypt (from stdin to stdout)

const { pipeline } = require('stream')
const { createDecgz } = require('./encgz-stream')

const [, , secret, initVect] = process.argv

const decgz = createDecgz(secret, Buffer.from(initVect, 'hex'))


pipeline(
  process.stdin,
  decgz,
  process.stdout,
  function onEnd (err) {
    if (err) {
      console.error(`Error: ${err}`)
      process.exit(1)
    }
  }
)

@loige

readable-stream - npm.im/readable-stream

Npm package that contains the latest version of Node.js stream library.

It also makes Node.js streams compatible with the browser (can be used with Webpack and Broswserify)

@loige

* yeah, the name is misleading. The package offers all the functionalities in the official 'stream' package, not just readable streams.

04. Writing custom streams

@loige

EmojiStream

Uppercasify

DOMAppend

🍋 Lemon

🍋 LEMON

🍋 LEMON

class EmojiStream
  extends Readable {
    _read() {
      // ...
    }
}

class Uppercasify
  extends Transform {
   _transform(
     chunk,
     enc,
     done
    ) {
      // ...
    }
}

class DOMAppend
  extends Writable {
   _write(
     chunk,
     enc,
     done
    ) {
      // ...
    }
}

🍌 Banana

🍌 BANANA

🍌 BANANA

this.push(data)

pass data to the next step

@loige

// emoji-stream.js (custom readable stream)
const { EMOJI_MAP } = require('emoji') // from npm
const { Readable } = require('readable-stream') // from npm
const emojis = Object.keys(EMOJI_MAP)
function getEmojiDescription (index) {
  return EMOJI_MAP[emojis[index]][1]
}
function getMessage (index) {
  return emojis[index] + ' ' + getEmojiDescription(index)
}

class EmojiStream extends Readable {
  constructor (options) {
    super(options)
    this._index = 0
  }

  _read () {
    if (this._index >= emojis.length) {
      return this.push(null)
    }
    return this.push(getMessage(this._index++))
  }
}

module.exports = EmojiStream

@loige

// uppercasify.js (custom transform stream)

const { Transform } = require('readable-stream')

class Uppercasify extends Transform {
  _transform (chunk, encoding, done) {
    this.push(chunk.toString().toUpperCase())
    done()
  }
}

module.exports = Uppercasify

@loige

// dom-append.js (custom writable stream)

const { Writable } = require('readable-stream')

class DOMAppend extends Writable {

  _write (chunk, encoding, done) {
    const elem = document.createElement('li')
    const content = document.createTextNode(chunk.toString())
    elem.appendChild(content)
    document.getElementById('list').appendChild(elem)
    done()
  }
}

module.exports = DOMAppend

05. Streams in the browser

@loige

// browser/app.js

const EmojiStream = require('../emoji-stream')
const Uppercasify = require('../uppercasify')
const DOMAppend = require('../dom-append')

const emoji = new EmojiStream()
const uppercasify = new Uppercasify()
const append = new DOMAppend()

emoji
  .pipe(uppercasify)
  .pipe(append)

@loige

npm i --save-dev webpack webpack-cli

node_modules/.bin/webpack src/browser/app.js

# creates dist/main.js

mv dist/main.js src/browser/app-bundle.js

@loige

Let's use webpack to build this app for the browser

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta
      name="viewport"
      content="width=device-width,initial-scale=1,shrink-to-fit=no"
    />
    <title>Streams in the browser!</title>
  </head>
  <body>
    <ul id="list"></ul>
    <script src="app.bundle.js"></script>
  </body>
</html>

@loige

Finally let's create an index.html

@loige

06. Closing

@loige

Streams have low memory footprint
Process data as soon as it's available
Composition through pipelines
Streams are abstractions:
- Readable = Input
- Transform = Business Logic
- Writable = Output

@loige

TLDR;

If you want to learn (even) moar 🐻about streams...

nodejs.org/api/stream.html

github.com/substack/stream-handbook

github.com/lmammino/streams-workshop

@loige

If you Are not convinced yet...

@loige

curl parrot.live

@loige

github.com/hugomd/parrot.live

Check out the codebase

@loige

Thank you!

loige.link/streams-manc

Credits

Cover picture by 1864750 from Pixabay
emojiart.org for the amazing emoji art

The internet for the memes! :D

Special thanks

@StefanoAbalsamo, @mariocasciaro, @machine_person, @Podgeypoos79, @katavic_d, @UrsoLuca, Austin Node.js meetup

@loige

It’s about time to embrace Node.js Streams

By Luciano Mammino

It’s about time to embrace Node.js Streams

With very practical examples we'll learn how streams work in Node.js & the Browser. With streams, you will be able to write elegant JavaScript applications that are much more composable and memory efficient! Streams are probably one of the most beautiful features of Node.js, but still largely underestimated and rarely used. Once you'll grasp the fundamentals, you'll be able to solve some ordinary programming challenges in a much more elegant and efficient way. With streams power in your tool belt, you'll be able to write applications that can deal with gigabytes or even terabytes of data efficiently. This talk will cover the following topics: Streams: when and how; Different types of streams; Built-in and custom streams; Composability; Utils & Streams in the browser.

4,525

Luciano Mammino PRO

Cloud developer, entrepreneur, fighter, butterfly maker! #nodejs #javascript - Author of https://www.nodejsdesignpatterns.com , Founder of https://fullstackbulletin.com

It’s about time to embrace Node.js Streams

We do this all the time

and it's ok

but sometimes ...

💥 ERR_FS_FILE_TOO_LARGE! 💥

But why?

if bytes were blocks...

Mario can lift

few blocks

but not too many...

What can we do if we have to move many blocks?

We CAN move them one by one!

👋 Hello, i am Luciano!

01. Buffers VS Streams

buffer: data structure to store and transfer arbitrary binary data

Stream: Abstract interface for working with streaming data

File copy: The buffer way

FILE COPY: The Stream way

Memory comparison (~600Mb file)

Memory comparison (~600Mb file)

let's try with a big file (~10Gb)

let's try with a big file (~10Gb)

👍 streams vs buffers 👎

03. Stream types & APIs

All Streams are event emitters

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams

Readable streams are also Async Iterators (Node.js 10+)

Writable streams

backpressure

Other types of stream

Anatomy of a transform stream

Gzip example

How can we use transform streams?

03. pipe()

04. Stream utilities

04. Writing custom streams

05. Streams in the browser

06. Closing

TLDR;

If you want to learn (even) moar 🐻about streams...

If you Are not convinced yet...

Thank you!

Credits

Special thanks

It’s about time to embrace Node.js Streams

More from Luciano Mammino

01. Buffers VS
Streams

03. Stream types
& APIs

Readable streams are also Async Iterators
(Node.js 10+)