Functional programming and distributed data
Why should you care?
- Understandable
- Parallelizable
What is functional programming?
- A style of programming in which pure functions are the main unit of computation
- Think jQuery vs React
- Possible in most languages, but easier in some
What makes a function pure?
- No side effects
- e.g ajax requests, writing to db, printing, changing external state
- Same input → same output
Is it pure?
const square = (x) => {
return x * x;
};Is it pure?
class Counter {
constructor() {
this.count = 0;
}
increment() {
this.count += 1;
return this.count;
}
}Is it pure?
const randPlusOne = () => {
return Math.random() + 1;
};Is it pure?
const age = (birthday) => {
return new Date() - birthday;
};Is it pure?
const setText = (newText) => {
$('#thing').text(newText);
};Is it pure?
const addLengths = (str1, str2) => {
return str1.length + str2.length;
};Is it pure?
const addNameLengths = (person1, person2) => {
return person1.name.length +
person2.name.length;
};const w = { name: 'Will' };
const g = { name: 'Grace' };
addNameLengths(w, g); // 9
w.name = 'William';
addNameLengths(w, g); // 12Purity requires immutability
Is it pure?
const range = (n) => {
const result = [];
for (let i = 0; i < n; i++) {
result.push(i);
}
return result;
};How can we do anything useful?
- Functional core, imperative shell
- Redux
- Model state changes as pure reducers
- Your code never mutates state
- React
- Model UI as pure components
- Your code never mutates DOM
Understandable
- Impure functions have hidden inputs and outputs
- hidden inputs: mutable dependencies
- hidden outputs: side effects
- Impure functions are often coupled in invisible ways
- Pure functions require all inputs/outputs to be explicit
- Calling a pure function can never break other code
- Values that change over time are difficult to keep track of
const x = impureThing(a, b);
const x = pureThing(a, b);const makeTiramisu = (
eggs, sugar1, wine, cheese, cream,
fingers, espresso, sugar2, cocoa
) => {
dissolve(sugar2, espresso);
const mixture = whisk(eggs);
beat(mixture, sugar1, wine);
whisk(mixture);
whip(cream);
beat(cheese);
beat(mixture, cheese);
fold(mixture, cream);
assemble(mixture, fingers);
sift(mixture, cocoa);
refrigerate(mixture);
return mixture;
};Example: tiramisu recipe
const makeTiramisu = (
eggs, sugar1, wine, cheese, cream,
fingers, espresso, sugar2, cocoa
) => {
const beatEggs = beat(eggs);
const mixture = beat(beatEggs, sugar1, wine);
const whisked = whisk(mixture);
const beatCheese = beat(cheese);
const cheeseMixture = beat(whisked, beatCheese);
const whippedCream = whip(cream);
const foldedMixture = fold(cheeseMixture, whippedCream);
const sweetEspresso = dissolve(sugar2, espresso);
const wetFingers = soak2seconds(fingers, sweetEspresso);
const assembled = assemble(foldedMixture, wetFingers);
const complete = sift(assembled, cocoa);
const readyTiramisu = refrigerate(complete);
return readyTiramisu;
};Example: tiramisu recipe
Parallelizable
- Can't parallelize if we don't understand dependencies between steps
- Mutable values make parallelization nearly impossible
let count = 5;
const increment = () => {
count = count + 1;
};
const doubles = (arr) => {
const result = [];
for (let i = 0; i < arr.length; i++) {
result.push(arr[i] * 2);
}
return result;
};const doubles = (arr) => {
return arr.map(x => x * 2);
};Airbnb
- Lots of transactions
- Complex db schema
- Incomprehensible to accountants
Apache Spark
- "fast and general engine for large-scale data processing"
- Supports Python, Java, Scala
- Resilient Distributed Dataset (RDD)
// (Event, Rule) => Array[Entry]
const execute = (event, rule) => { ... };
// (Event, Rule) => Boolean
const applies = (event, rule) => { ... };
// (SparkContext) => RDD[Event]
const loadEvents = (sc) => { ... };
// (SparkContext) => RDD[Rule]
const loadRules = (sc) => { ... };
// (SparkContext, RDD[Entry]) => undefined
const saveEntries = (sc, entries) => { ... };class Event { ... }
class Rule { ... }
class Entry { ... }const sc = new SparkContext();
const events = loadEvents(sc);
const rules = loadRules(sc);
const entries = run(events, rules);
saveEntries(sc, entries);// (RDD[Event], RDD[Rule]) => RDD[Entry]
const run = (events, rules) => {
let result = [];
events.forEach(event => {
rules.forEach(rule => {
if (applies(event, rule) {
const entries = execute(event, rule);
result = result.concat(entries);
}
});
});
return result;
};const makePair = (n) => [n, n];
[1, 2].map(makePair); // [[1, 1], [2, 2]]
[1, 2].flatMap(makePair); // [1, 1, 2, 2]// (RDD[Event], RDD[Rule]) => RDD[Entry]
const run = (events, rules) => (
rules.flatMap(rule => (
events
.filter(event => applies(event, rule))
.flatMap(event => execute(event, rule));
));
);
Why doesn't everybody do this?
- Historical limitations in memory
- Parallelism only needed recently
- Entrenched in education and language design
- Doesn't always align with real world perception
- But things are changing!
- JavaScript
- Brian Lonsdorf
- Immutable.js
- Elm
What next?
Appendix: Performance
"If you want fast, start with comprehensible"
- Paul Phillips
Lazy evaluation
[1, 2, 3, 4, 5]
.filter(x => x % 2 !== 0)
.map(x => x * x)
[1];
import { Seq } from 'immutable';
Seq([ 1, 2, 3, 4, 5 ])
.filter(x => x % 2 !== 0)
.map(x => x * x)
.get(1);import { Range } from 'immutable';
Range(1, Infinity)
.filter(x => x % 2 !== 0)
.map(x => x * x)
.get(1);Memoization
import { memoize } from 'lodash';
const memMakePair = memoize(makePair);
memMakePair(1); // [1, 1]
memMakePair(1); // use cached valueconst onePair = memMakePair(1);
onePair.push(2);
memMakePair(1); // [1, 1, 2]Questions?
a/A functional programming
By Phil Nachum
a/A functional programming
- 280