Functional programming and distributed data

Why should you care?

Understandable
Parallelizable

What is functional programming?

A style of programming in which pure functions are the main unit of computation
Think jQuery vs React
Possible in most languages, but easier in some

What makes a function pure?

No side effects
- e.g ajax requests, writing to db, printing, changing external state
Same input → same output

Is it pure?

const square = (x) => {
  return x * x;
};

Is it pure?

class Counter {
  constructor() {
    this.count = 0;
  }

  increment() {
    this.count += 1;
    return this.count;
  }
}

Is it pure?

const randPlusOne = () => {
  return Math.random() + 1;
};

Is it pure?

const age = (birthday) => {
  return new Date() - birthday;
};

Is it pure?

const setText = (newText) => {
  $('#thing').text(newText);
};

Is it pure?

const addLengths = (str1, str2) => {
  return str1.length + str2.length;
};

Is it pure?

const addNameLengths = (person1, person2) => {
  return person1.name.length + 
    person2.name.length;
};

const w = { name: 'Will' };
const g = { name: 'Grace' };

addNameLengths(w, g); // 9
w.name = 'William';
addNameLengths(w, g); // 12

Purity requires immutability

Is it pure?

const range = (n) => {
  const result = [];
  for (let i = 0; i < n; i++) {
    result.push(i);
  }
  return result;
};

How can we do anything useful?

Functional core, imperative shell
Redux
- Model state changes as pure reducers
- Your code never mutates state
React
- Model UI as pure components
- Your code never mutates DOM

Understandable

Impure functions have hidden inputs and outputs
- hidden inputs: mutable dependencies
- hidden outputs: side effects
Impure functions are often coupled in invisible ways
Pure functions require all inputs/outputs to be explicit
Calling a pure function can never break other code
Values that change over time are difficult to keep track of

const x = impureThing(a, b);
const x = pureThing(a, b);

Source: What's Functional Programming All About?

const makeTiramisu = (
  eggs, sugar1, wine, cheese, cream, 
  fingers, espresso, sugar2, cocoa
) => {
  dissolve(sugar2, espresso);
  const mixture = whisk(eggs);
  beat(mixture, sugar1, wine);
  whisk(mixture);
  whip(cream);
  beat(cheese);
  beat(mixture, cheese);
  fold(mixture, cream);
  assemble(mixture, fingers);
  sift(mixture, cocoa);
  refrigerate(mixture);
  return mixture;
};

Example: tiramisu recipe

const makeTiramisu = (
  eggs, sugar1, wine, cheese, cream, 
  fingers, espresso, sugar2, cocoa
) => {
  const beatEggs = beat(eggs);
  const mixture = beat(beatEggs, sugar1, wine);
  const whisked = whisk(mixture);
  const beatCheese = beat(cheese);
  const cheeseMixture = beat(whisked, beatCheese);
  const whippedCream = whip(cream);
  const foldedMixture = fold(cheeseMixture, whippedCream);
  const sweetEspresso = dissolve(sugar2, espresso);
  const wetFingers = soak2seconds(fingers, sweetEspresso);
  const assembled = assemble(foldedMixture, wetFingers);
  const complete = sift(assembled, cocoa);
  const readyTiramisu = refrigerate(complete);
  return readyTiramisu;
};

Example: tiramisu recipe

Parallelizable

Can't parallelize if we don't understand dependencies between steps
Mutable values make parallelization nearly impossible

let count = 5;

const increment = () => {
  count = count + 1;
};

const doubles = (arr) => {
  const result = [];
  for (let i = 0; i < arr.length; i++) {
    result.push(arr[i] * 2);
  }
  return result;
};

const doubles = (arr) => {
  return arr.map(x => x * 2);
};

Airbnb

Lots of transactions
Complex db schema
Incomprehensible to accountants

Apache Spark

"fast and general engine for large-scale data processing"
Supports Python, Java, Scala
Resilient Distributed Dataset (RDD)

// (Event, Rule) => Array[Entry]
const execute = (event, rule) => { ... };

// (Event, Rule) => Boolean
const applies = (event, rule) => { ... };

// (SparkContext) => RDD[Event]
const loadEvents = (sc) => { ... };

// (SparkContext) => RDD[Rule]
const loadRules = (sc) => { ... };

// (SparkContext, RDD[Entry]) => undefined
const saveEntries = (sc, entries) => { ... };

class Event { ... }
class Rule { ... }
class Entry { ... }

const sc = new SparkContext();
const events = loadEvents(sc);
const rules = loadRules(sc);
const entries = run(events, rules);
saveEntries(sc, entries);

// (RDD[Event], RDD[Rule]) => RDD[Entry]
const run = (events, rules) => {
  let result = [];

  events.forEach(event => {
    rules.forEach(rule => {
      if (applies(event, rule) {
        const entries = execute(event, rule);
        result = result.concat(entries);
      }
    });
  });

  return result;
};

const makePair = (n) => [n, n];

[1, 2].map(makePair); // [[1, 1], [2, 2]]
[1, 2].flatMap(makePair); // [1, 1, 2, 2]

// (RDD[Event], RDD[Rule]) => RDD[Entry]
const run = (events, rules) => (
  rules.flatMap(rule => (
    events
      .filter(event => applies(event, rule))
      .flatMap(event => execute(event, rule));
  ));
);

Why doesn't everybody do this?

Historical limitations in memory
Parallelism only needed recently
Entrenched in education and language design
Doesn't always align with real world perception
But things are changing!

JavaScript
- Brian Lonsdorf
  - Oh Composable World
  - Egghead series
- Immutable.js
  - Explanation video
Elm
- Tutorial
- Egghead series

What next?

Appendix: Performance

"If you want fast, start with comprehensible"

- Paul Phillips

Lazy evaluation

[1, 2, 3, 4, 5]
  .filter(x => x % 2 !== 0)
  .map(x => x * x)
  [1];

import { Seq } from 'immutable';
Seq([ 1, 2, 3, 4, 5 ])
  .filter(x => x % 2 !== 0)
  .map(x => x * x)
  .get(1);

import { Range } from 'immutable';
Range(1, Infinity)
  .filter(x => x % 2 !== 0)
  .map(x => x * x)
  .get(1);

Memoization

import { memoize } from 'lodash';

const memMakePair = memoize(makePair);

memMakePair(1); // [1, 1]
memMakePair(1); // use cached value

const onePair = memMakePair(1);
onePair.push(2);
memMakePair(1); // [1, 1, 2]

Functional programming and distributed data

Why should you care?

What is functional programming?

What makes a function pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Purity requires immutability

Is it pure?

How can we do anything useful?

Understandable

Example: tiramisu recipe

Example: tiramisu recipe

Parallelizable

Airbnb

Apache Spark

Why doesn't everybody do this?

What next?

Appendix: Performance

Lazy evaluation

Memoization

Questions?

a/A functional programming

a/A functional programming

Phil Nachum

Functional programming and distributed data

Why should you care?

What is functional programming?

What makes a function pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Is it pure?

Purity requires immutability

Is it pure?

How can we do anything useful?

Understandable

Example: tiramisu recipe

Example: tiramisu recipe

Parallelizable

Airbnb

Apache Spark

Why doesn't everybody do this?

What next?

Appendix: Performance

Lazy evaluation

Memoization

Questions?

a/A functional programming

a/A functional programming

Phil Nachum

More from Phil Nachum