Map Reduce

Talk by Kieran Andrews

http://kieranandrews.com.au

Map Reduce

...programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

Basics

Map

filtering and sorting

This distributes the set into smaller problems

Reduce

collects all the answers to the sub problems

and combines them

In Ruby

rainbow = ['red', 'orange', 'yellow', 'green', 'blue', 'purple']

Select

rainbow.select {|color| color.size >= 5} # ['orange', 'yellow', 'green', 'purple']

Map

rainbow.map {|color| color.upcase}# ['RED', 'ORANGE', 'YELLOW', 'GREEN', 'BLUE', 'PURPLE'

Reduce (inject)

# number of colorsrainbow.reduce(0) {|acc, n| acc += 1}# 6

How about Highly dIstributed?

The chunks (during map) can be processed in parallel

Distribute this across several servers (100s)

word_chunks.map do |chunk|
  assign_to_server(count_words(chunk)) #[{"the" => 1}, {"cat" => 1}, {"the" => 1], {"dog" => 1}]
end

Improves performance for large data sets

Example Time

Normally do on something like Hadoop

Local example showing the concept

http://www.bigfastblog.com/map-reduce-with-ruby-using-hadoop#coding-your-map-and-reduce-scripts-in-ruby

https://github.com/TigerWolf/ruby-map-reduce

Hadoop

open source project

for storing and large scale processing of data-sets on clusters of commodity hardware

apache foundation

MongoDB

Also supports mapreduce!

Try it out yourself, mapreduce with mongo in the browser

https://github.com/angelozerr/mongo-mapreduce-webbrowser

RubyGem for MapReduce with Mongo:

https://github.com/jcoene/mongoid-mapreduce

Fin

Thanks

Hope you enjoyed.

Map Reduce in Ruby

By Kieran Andrews

Map Reduce in Ruby

7,249

Map Reduce

Map Reduce

Basics

Map

Reduce

In Ruby

How about Highly dIstributed?

Example Time

Hadoop

MongoDB

Fin

Map Reduce in Ruby

More from Kieran Andrews