Streaming Applications
with
geekcamp Indonesia - 15 July 2017
About Me
- Senior Software Engineer at Citadel Technology Solutions
- Currently working in:
- Scala
- Kotlin
- Currently 'spiking' in:
- Elixir
- Elm
- Dart
- Giving back to the community:
- OSS project maintainer
- Singapore Scala Meetup group organiser
- Engineers.SG volunteer
_hhandoko
hhandoko
hhandoko
hhandoko.com
Engineers.SG
- Community initiative to help document Singapore's tech and startup scene
- 1800+ videos of local Meetups, conferences, and other developer events
- Support Michael on Patreon!
https://www.patreon.com/coderkungfu
Who? What?
Target Audience
- Anyone interested in streaming applications or stream processing:
- Developers
- Solutions Architect
- Product Managers
- etc.
- Helpful to have some programming experience, but no prior Scala or Akka knowledge necessary
Agenda and Objectives
- Let's agree on some terms and definitions...
- What problems are streaming applications solving?
- What can Akka offer stream processing?
- Show me the money! (or just a demo...)
- What else is out there?
Do you mean...?
Streams
- A sequence of data elements made available over time
- Processed differently from batch data
- Streams are codata (potentially unlimited / infinite)
- Streams are everywhere:
- Event streams
- Real-time metrics
- Streaming media
- etc.
Stream Processing
"Given a sequence of data (a stream), a series of operations is applied to each element in the stream."
- A computer programming paradigm:
- Dataflow programming
- Event stream processing
- Reactive programming
- Think about how map operation works against a collection
Streaming (Data) Application
"A non-hard real-time system that makes its data available at the moment a client application needs it."
[1] - Psaltis, A.G., 2017, Streaming Data, Manning Publishing, pp.8-9
Fast Data
"Depending on use types, the speed at which organizations can convert data into insight and then to action is considered just as critical as the ability to leverage big data, if not more so. In fact, more than half (54%) of respondents stated that they consider leveraging fast data to be more important than leveraging big data."
Big Data
or
Fast Data
- Infinite / ephemeral flow
- Per-element
- Tactical
- Proactive
- Data in-motion
Big Data
- Finite
- Batch
- Strategic
- Reactive
- Data at rest
and
What's all this?
Akka
"Coarse-grained concurrency library and runtime, emphasizing actor-based concurrency with inspiration drawn from Erlang."
- Actors are stateful entities which communicates via message passing:
- Concurrent and parallel
- Asynchronous and non-blocking
- Supervision and monitoring
Actor and Streams
- Actors model stream processing well:
- Receive (and send) messages
- Uses (bounded) mailbox
- Process messages sequentially
- However, not without challenges:
- Buffer (and mailbox) overflows
- Wiring errors
- Hard to conceptualise flow at higher level
- Actors do not compose like normal functions
Akka Streams
- Provides a way to express and run a chain of async processing steps acting on a sequence of elements
- Frees developer to think about the bigger picture, composing a pipeline of functions (with actors)
- Bounded resource usage via Reactive Streams
- Limit buffering
- Slow down producers if consumers cannot keep up (backpressure)
Reactive Streams
- Initiative to provide a standard for async stream processing
- In essence:
- Process a potentially unbounded number of elements
- in a sequence
- asynchronously passing elements between components
- with mandatory non-blocking backpressure
Backpressure
- Signalling (notify demand to the producer)
- Makes sure the publisher can give messages at the rate of the subscriber can consume
Akka Streams Primer
ActorSystem
- A hierarchical group of actors which share common configuration, e.g. dispatchers, deployments, remote capabilities and addresses
- The entry point for creating or looking up actors
Materializer
- The magic behind the scenes
- Converts a list of akka.stream.scaladsl.Flow into org.reactivestreams.Processor instances
- Applies 'Operator Fusion' optimisations
Source[+Out, M1]
- The starting point of the stream, where the data flowing through the stream originates from
val sourceFromRange = Source(1 to 1000)
val sourceFromIterable = Source(List(1,2,3))
val sourceFromFuture = Source.fromFuture(Future.successful("hello"))
val sourceWithSingleElement = Source.single("just one")
val sourceEmittingTheSameElement = Source.repeat("again and again")
val emptySource = Source.empty
- Has one output but no input
Flow[-In, +Out, M2]
- A processing step within the stream, which combines one incoming channel and one outgoing channel and applies some transformation
val flowDoublingElements = Flow[Int].map(_ * 2)
val flowFilteringOutOddElements = Flow[Int].filter(_ % 2 == 0)
val flowBatchingElements = Flow[Int].grouped(10)
val flowBufferingElements = Flow[String].buffer(1000, OverflowStrategy.backpressure)
- Has one input and one output
Sink[-In, M3]
-
The ultimate destination of all the messages flowing through the stream
val sinkPrintingOutElements = Sink.foreach[String](println(_))
val sinkCalculatingASumOfElements = Sink.fold[Int, Int](0)(_ + _)
val sinkReturningTheFirstElement = Sink.head
val sinkNoop = Sink.ignore
- Has one input but no output
What does it look like?
FizzBuzz
- Task:
Write a program that prints the integers from 1 to 1000 (inclusive).
But:- for multiples of three, print Fizz (instead of the number)
- for multiples of five, print Buzz (instead of the number)
- for multiples of both three and five, print FizzBuzz (instead of the number)
FizzBuzz: Start
- Create a minimal runnable flow
object FizzBuzz extends App {
implicit val sys = ActorSystem("fizzbuzz")
implicit val mat = ActorMaterializer()
val rangeSource = Source(1 to 1000)
val printlnSink = Sink.foreach[Int](println)
rangeSource
.to(printlnSink)
.run()
sys.terminate()
}
- Source from a range of Int
- Sink that performs println(…)
FizzBuzz: Flow
- Add 'FizzBuzz' detector as transformation step
object FizzBuzz extends App {
// ...
val fizzBuzzFlow = Flow[Int].map {
case i if i % 15 == 0 => "FizzBuzz"
case i if i % 5 == 0 => "Buzz"
case i if i % 3 == 0 => "Fizz"
case i => i.toString
}
// ...
rangeSource
.via(fizzBuzzFlow) // New step added!
.to(printlnSink)
// ...
}
- Flow takes a simple function:
Int => String
Akka Streams Primer (cont'd)
- Graph is a processing stage built from Source, Flow, and Sink
- RunnableGraph is a processing stage with no inputs and outputs, closed shape ready to run
FizzBuzz: Compose
- Create composites by combining shapes together
object FizzBuzz extends App {
// ...
val nestedSource = rangeSource.via(fizzBuzzFlow) // Nest the source and flow
// ...
val nestedFlow = prefixFlow.via(suffixFlow).via(uppercaseFlow) // Nest FizzBuzz transformations
val nestedSink = nestedFlow.to(printlnSink) // Nest transformations and sink
nestedSource
.to(nestedSink)
.run()
// ...
}
FizzBuzz: Visualise
- GraphDSL helps to model (more) complex flows
object FizzBuzz extends App {
// ...
val graph = GraphDSL.create() { implicit builder =>
// ...
import GraphDSL.Implicits._
rangeSource ~> fizzBuzzFlow ~> prefixFlow ~> suffixFlow ~> uppercaseFlow ~> printlnSink
ClosedShape
}
RunnableGraph.fromGraph(graph)
.run()
// ...
}
FizzBuzz: Combine
- PartialGraph can be linked to other graphs or shapes
object FizzBuzz extends App {
// ...
val graph = GraphDSL.create() { implicit builder =>
// ...
import GraphDSL.Implicits._
SourceGraph.g ~> TransformGraph.g ~> sink
ClosedShape
}
RunnableGraph.fromGraph(graph)
.run()
// ...
}
Fan-out
-
Broadcast[T]
(1 input, N outputs) -
Balance[T]
(1 input, N outputs) -
UnzipWith[In, A, B, ...]
(1 input, N outputs) -
UnZip[A, B]
(1 input, 2 outputs)
Fan-in
-
Merge[In]
(N inputs, 1 output) -
MergePreferred[In]
(N inputs, 1 output) -
MergePrioritized[In]
(N inputs, 1 output) -
ZipWith[A, B, ...]
(N inputs, 1 output) -
Zip[A, B]
(2 inputs, 1 output) -
Concat[A]
(2 inputs, 1 output)
FizzBuzz: Enhance!
- Use predefined shapes to create complex flows
object FizzBuzz extends App {
// ...
val graph = GraphDSL.create() { implicit builder =>
// ...
import GraphDSL.Implicits._
SourceGraph.g ~> TransformGraph.g ~> sink
ClosedShape
}
RunnableGraph.fromGraph(graph)
.run()
// ...
}
Visual > Textual: Code
Visual > Textual: Graph
What's out there?
Current Solutions
- Streaming Engine
- Streaming Libraries
- Streaming Applications
- IoT
- DSL
- Data Pipeline
- Online Machine Learning
- Stream SQL
- Toolkit
- etc.
Java? (╯°Д°)╯︵ /(.□ . \)
Flow-Based Libraries
- DSPatch (C++)
http://flowbasedprogramming.com/DSPatch/index.html
- GoFlow (Go)
https://github.com/trustmaster/goflow
- Flowex (Elixir)
https://github.com/antonmi/flowex
Can I write *even* less code?
NoFlo https://noflojs.org/
- JavaScript implementation of Flow-Based Programming
- Web or NodeJs
- Can be written in any language that transpiles into JavaScript
Pyroclast http://pyroclast.io/
- PaaS for real-time event streaming applications
- Clojure and ClojureScript