ActorFS

Cloud Object File System for Big Data

Solution for Unstructured Data Organization

Miras Technologies, Luxembourg
June 2014

What is




Designed for Big Data
File, Object, Page
Integration: HCFS, REST, POSIX, AKKA , SCALA
Web Console + Command Shell


General Problem

Storage System is a "Must Have"

background Layer 1 Volume Velocity Veracity Variety Data Users Machine Data Machine Fault Apps Growth Change Input Data Control Utilities

Features


Layer 1 ActorFS Console Reactiveness Multi-Core Interfaces CUDA

Simplicity Hadoop Spark
REST AKKA POSIX Multi User

Map/Reduce

10 lines of code in Console
3 lines effective


twister.map {
      (word: String) =>
        (word, 1)
    }.reduce((word: String) => 0) {
      (prev: (String, Int), next: (String, Int)) =>
        (prev._1, prev._2 + next._2)
    }.result {
      word: (String, Int) =>
        println(s"result $word")
    }

Console

File Systems are complex, a console simplifies it



    _____            __                __________ _________
  /  _  \    ___ __/  |  ____ ______ \_   _____//   _____/
 /  /_\  \ / ___\\   __\/    \\_  __ \|  ____)  \_____  \
/    |    \\ \___ |  | (  ()  )|  | \/|     \   ______   \
\____|__  / \__  ||__|  \____/ |__|   \___  /  \_______  /
        \/     \/                         \/           \/
      
Copyright 2013-2014 Miras Technologies International
actorfs> help

Object Interface

os / "cities"


talk {

afs: Afs =>
import afs._

val col = os / "cities" col <<(1, "Shanghai", 2415000)
col <<(2, "Beijing", 2115000)
col <<(3, "Karachi", 21142000)
col <<(4, "Lagos", 16060307)
col <<(5, "Istanbul", 14160000)
col <<(6, "Guangzhou", 12700000)
col <<(7, "Mumbai", 12400000)
col <<(8, "Moscow", 12111000)
col <<(9, "São Paulo", 11820000)

os / "cities" >> println }


Paging is more Efficient

Paging is impossible with centralized systems


background Layer 1 Blocking Paging



Search

Fully Distributed Index Tree Embedded
16 n = 9,600,000 Key/s
Asynchronous

Layer 1 Node Machine Link

Semi Ordered IO

3 joint reader vs 3 in dependent

Layer 1 Asynchronous Synchronous Semi-Ordered Next Current Scheduled

in-Memory

with-SSD +  over-Disk
All Supported
Layer 1 Capacity Latency Randomness 64Gbps 1Gbps 2Gbps 1G = 1x 1G = 40x 1G = 4x Disk DRAM SSD

CUDA

27 x on Encryption 
Asynchronous Operations

background Layer 1 Manycore GPU Scalable Parallel Processing Throughput-Optimized Architecture Multicore CPU Fast Serial Processing Latency-Optimized Architecture

Applications


1- High Volume Data Collection
2- Asynchronous Data Crunching
3- Parallel Indexing (with 16 nodes, 10 Mkps)
4- High throughput data crunching with Spark and Hadoop
5- Unstructured data federation and management
6- Cloud bursting and storage integration
7- Security control over organizational data assets
8- Geographically distributed data integration and processing
9- Sensor Data Collection
10- Reactive Data Crunching using AKKA
11- Data Staging and Pruning
12- Cloud Archival & Backups






Reactive Systems are Reliable
Good Luck


Miras Tech Int.
Luxembourg

Word Count with Hadoop

Let start to import first!


2
3 import java.io.IOException;
4 import java.util.*;
5
6 import org.apache.hadoop.fs.Path;
7 import org.apache.hadoop.conf.*;
8 import org.apache.hadoop.io.*;
9 import org.apache.hadoop.mapreduce.*;
10 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
11 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
13 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;


The code is official

Map

(word,1)

15 public class WordCount {
16
17 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
18 private final static IntWritable one = new IntWritable(1);
19 private Text word = new Text();
20
21 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
22 String line = value.toString();
23 StringTokenizer tokenizer = new StringTokenizer(line);
24 while (tokenizer.hasMoreTokens()) {
25 word.set(tokenizer.nextToken());
26 context.write(word, one);
27 }
28 }
29 }


Reduce

_+_

31 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
32
33 public void reduce(Text key, Iterable<IntWritable> values, Context context)
34 throws IOException, InterruptedException {
35 int sum = 0;
36 for (IntWritable val : values) {
37 sum += val.get();
38 }
39 context.write(key, new IntWritable(sum));
40 }
41 }


Execution

oh!

43 public static void main(String[] args) throws Exception {
44 Configuration conf = new Configuration();
45
46 Job job = new Job(conf, "wordcount");
47
48 job.setOutputKeyClass(Text.class);
49 job.setOutputValueClass(IntWritable.class);
50
51 job.setMapperClass(Map.class);
52 job.setReducerClass(Reduce.class);
53
54 job.setInputFormatClass(TextInputFormat.class);
55 job.setOutputFormatClass(TextOutputFormat.class);
56
57 FileInputFormat.addInputPath(job, new Path(args[0]));
58 FileOutputFormat.setOutputPath(job, new Path(args[1]));
59
60 job.waitForCompletion(true);
61 }
62
63 }

Hadoop 63 vs 10 ActorFS

Advantages




Save time: No compilation tool chain

Easy Learning: Say what you think, ActorFS knows how

Performance: use CUDA to save storage

ActorFS

By Hamed Ghasemzadeh