Oleksiy Dyagilev
ABS (anti-lock brakes )
railway switching system
chess playing program
video streaming
analytical applications
IoT(Internet of Things)
mobile devices
sensor data (meteo, genomics, geo, bio, etc)
social media
Internet search
user activity tracking
software logs
A high-throughput distributed messaging system
each time data traverses the user-kernel boundary, it must be copied, which consumes CPU cycles and memory bandwidth. Benchmark shows that time reduced in more than 2x with zero copy.
java.nio.channels.FileChannel.transferTo() available on linux
traditional approach
zero copy
Consumer:
Producer:
To guarantee exactly-once:
Environment:
Results:
Free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing
class SplitSentence extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String string = tuple.getString(0);
for (String word : string.split(" ")) {
collector.emit(new Values(word));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
public static void main(String[] args) throws Exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar("word-counter", conf, builder.createTopology());
}
Trident is a high-level abstraction for doing stateful, incremental processing on top of persistence store
TridentState wordCounts = topology
.newStream("spout1", spout).parallelismHint(16)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")).parallelismHint(16);
topology.newDRPCStream("words", drpc)
.each(new Fields("args"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
.each(new Fields("count"), new FilterNull())
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
Consider 'transactional' spout:
highly scalable equivalent of Realtime Google Analytics on top of Storm and GigaSpaces.
Application can be deployed to cloud with one click using Cloudify
Code available on github
PageView:
{
“sessionId”: “sessionid581239234”,
“referral”: “https://www.google.com/#q=gigaspace”,
“page”: “http://www.gigaspaces.com/about”,
“ip”: “89.162.139.2”
}
Resources: