Big data for masses
Marcin Stożek "Perk" / @marcinstozek
how long would it take to sort 15 PB of data on one computer?
(hint: in days rather than hours)
why not distribute the data, sort chunk by chunk and just merge the result?
sqr k = k * k
sqr list -> [1, 4, 9, 16]
sqr list2 -> [25, 36, 49, 64]
sqr list3 -> [81, 100, 121, 144]
add [a, .., z] = a + .. + z
add list -> 30
add list2 -> 174
add list3 -> 446
add list list2 list3 -> 650
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
... distributed processing
of large data sets
across clusters of computers...
Hadoop is a computing framework
You can think of them as of Linux distributions
Internet search is your friend here
Marcin Stożek "Perk" / @marcinstozek