Map Reduce

Talk by Kieran Andrews

Map Reduce

...programming model for processing large data sets with a parallel, distributed algorithm on a cluster.



filtering and sorting

This distributes the set into smaller problems


collects all the answers to the sub problems

and combines them

In Ruby

rainbow = ['red', 'orange', 'yellow', 'green', 'blue', 'purple'] 
Select {|color| color.size >= 5} # ['orange', 'yellow', 'green', 'purple'] 
Map {|color| color.upcase}# ['RED', 'ORANGE', 'YELLOW', 'GREEN', 'BLUE', 'PURPLE' 
Reduce (inject)
# number of colorsrainbow.reduce(0) {|acc, n| acc += 1}# 6 

How about Highly dIstributed?

The chunks (during map) can be processed in parallel
Distribute this across several servers (100s) do |chunk|
  assign_to_server(count_words(chunk)) #[{"the" => 1}, {"cat" => 1}, {"the" => 1], {"dog" => 1}]
Improves performance for large data sets

Example Time

Normally do on something like Hadoop
Local example showing the concept


open source project
for storing and large scale processing of data-sets on clusters of commodity hardware 
apache foundation


Also supports mapreduce!
Try it out yourself, mapreduce with mongo in the browser

RubyGem for MapReduce with Mongo:



Hope you enjoyed.

Map Reduce in Ruby

By Kieran Andrews

Map Reduce in Ruby

  • 6,294
Loading comments...

More from Kieran Andrews