Streaming Applications

with

geekcamp Indonesia - 15 July 2017

About Me

  • Senior Software Engineer at Citadel Technology Solutions

 

  • Currently working in:
    • Scala
    • Kotlin

 

  • Currently 'spiking' in:
    • Elixir
    • Elm
    • Dart
  • Giving back to the community:
    • OSS project maintainer
    • Singapore Scala Meetup group organiser
    • Engineers.SG volunteer

 

               _hhandoko

               hhandoko

               hhandoko

               hhandoko.com

Engineers.SG

  • Community initiative to help document Singapore's tech and startup scene

 

  • 1800+ videos of local Meetups, conferences, and other developer events

 

Who? What?

Target Audience

  • Anyone interested in streaming applications or stream processing:
    • Developers
    • Solutions Architect
    • Product Managers
    • etc.

 

  • Helpful to have some programming experience, but no prior Scala or Akka knowledge necessary

Agenda and Objectives

  • Let's agree on some terms and definitions...

 

  • What problems are streaming applications solving?

 

  • What can Akka offer stream processing?

 

  • Show me the money! (or just a demo...)

 

  • What else is out there?

Do you mean...?

Streams

  • A sequence of data elements made available over time

 

  • Processed differently from batch data

 

  • Streams are codata (potentially unlimited / infinite)
  • Streams are everywhere:
    • Event streams
    • Real-time metrics
    • Streaming media
    • etc.

 

Stream Processing

"Given a sequence of data (a stream), a series of operations is applied to each element in the stream."

  • A computer programming paradigm:
    • Dataflow programming
    • Event stream processing
    • Reactive programming

 

  • Think about how map operation works against a collection

Streaming (Data) Application

"A non-hard real-time system that makes its data available at the moment a client application needs it."

[1] - Psaltis, A.G., 2017, Streaming Data, Manning Publishing, pp.8-9

Fast Data

"Depending on use types, the speed at which organizations can convert data into insight and then to action is considered just as critical as the ability to leverage big data, if not more so. In fact, more than half (54%) of respondents stated that they consider leveraging fast data to be more important than leveraging big data."

Big Data

or

Fast Data

  • Infinite / ephemeral flow

 

  • Per-element

 

  • Tactical

 

  • Proactive

 

  • Data in-motion

Big Data

  • Finite

 

  • Batch

 

  • Strategic

 

  • Reactive

 

  • Data at rest

and

What's all this?

Akka

"Coarse-grained concurrency library and runtime, emphasizing actor-based concurrency with inspiration drawn from Erlang."

  • Actors are stateful entities which communicates via message passing:
    • Concurrent and parallel
    • Asynchronous and non-blocking
    • Supervision and monitoring

Actor and Streams

  • Actors model stream processing well:
    • Receive (and send) messages
    • Uses (bounded) mailbox
    • Process messages sequentially
  • However, not without challenges:
    • Buffer (and mailbox) overflows
    • Wiring errors
    • Hard to conceptualise flow at higher level
    • Actors do not compose like normal functions

Akka Streams

  • Provides a way to express and run a chain of async processing steps acting on a sequence of elements

 

  • Frees developer to think about the bigger picture, composing a pipeline of functions (with actors)
  • Bounded resource usage via Reactive Streams
    • Limit buffering
    • Slow down producers if consumers cannot keep up (backpressure)

Reactive Streams

  • Initiative to provide a standard for async stream processing

 

  • In essence:
    • Process a potentially unbounded number of elements
    • in a sequence
    • asynchronously passing elements between components
    • with mandatory non-blocking backpressure

Backpressure

  • Signalling (notify demand to the producer)

 

  • Makes sure the publisher can give messages at the rate of the subscriber can consume

Akka Streams Primer

ActorSystem

  • A hierarchical group of actors which share common configuration, e.g. dispatchers, deployments, remote capabilities and addresses

 

  • The entry point for creating or looking up actors

Materializer

  • The magic behind the scenes

 

  • Converts a list of akka.stream.scaladsl.Flow into org.reactivestreams.Processor instances

 

  • Applies 'Operator Fusion' optimisations

Source[+Out, M1]

  • The starting point of the stream, where the data flowing through the stream originates from
val sourceFromRange = Source(1 to 1000)
val sourceFromIterable = Source(List(1,2,3))
val sourceFromFuture = Source.fromFuture(Future.successful("hello"))
val sourceWithSingleElement = Source.single("just one")
val sourceEmittingTheSameElement = Source.repeat("again and again")
val emptySource = Source.empty
  • Has one output but no input

Flow[-In, +Out, M2]

  • A processing step within the stream, which combines one incoming channel and one outgoing channel and applies some transformation
val flowDoublingElements = Flow[Int].map(_ * 2)
val flowFilteringOutOddElements = Flow[Int].filter(_ % 2 == 0)
val flowBatchingElements = Flow[Int].grouped(10)
val flowBufferingElements = Flow[String].buffer(1000, OverflowStrategy.backpressure)
  • Has one input and one output

Sink[-In, M3]

  • The ultimate destination of all the messages flowing through the stream
val sinkPrintingOutElements = Sink.foreach[String](println(_))
val sinkCalculatingASumOfElements = Sink.fold[Int, Int](0)(_ + _)
val sinkReturningTheFirstElement = Sink.head
val sinkNoop = Sink.ignore
  • Has one input but no output

What does it look like?

FizzBuzz

  • Task:
    Write a program that prints the integers from   1   to   1000   (inclusive).

    But:
    • for multiples of three, print   Fizz   (instead of the number)
    • for multiples of five, print   Buzz   (instead of the number)
    • for multiples of both three and five, print   FizzBuzz   (instead of the number)

FizzBuzz: Start

  • Create a minimal runnable flow
object FizzBuzz extends App {
  implicit val sys = ActorSystem("fizzbuzz")
  implicit val mat = ActorMaterializer()

  val rangeSource  = Source(1 to 1000)
  val printlnSink  = Sink.foreach[Int](println)

  rangeSource
    .to(printlnSink)
    .run()

  sys.terminate()
}
  • Source from a range of Int
  • Sink that performs println(…)

FizzBuzz: Flow

  • Add 'FizzBuzz' detector as transformation step
object FizzBuzz extends App {
  // ...
  val fizzBuzzFlow = Flow[Int].map {
    case i if i % 15 == 0 => "FizzBuzz"
    case i if i % 5 == 0  => "Buzz"
    case i if i % 3 == 0  => "Fizz"
    case i                => i.toString
  }
  // ...
  rangeSource
    .via(fizzBuzzFlow) // New step added!
    .to(printlnSink)
  // ...
}
  • Flow takes a simple function:
    Int => String

Akka Streams Primer (cont'd)

  • Graph is a processing stage built from Source, Flow, and Sink

 

 

  • RunnableGraph is a processing stage with no inputs and outputs, closed shape ready to run

FizzBuzz: Compose

  • Create composites by combining shapes together
object FizzBuzz extends App {
  // ...
  val nestedSource = rangeSource.via(fizzBuzzFlow) // Nest the source and flow
  // ...
  val nestedFlow   = prefixFlow.via(suffixFlow).via(uppercaseFlow) // Nest FizzBuzz transformations
  val nestedSink   = nestedFlow.to(printlnSink) // Nest transformations and sink

  nestedSource
    .to(nestedSink)
    .run()
  // ...
}

FizzBuzz: Visualise

  • GraphDSL helps to model (more) complex flows
object FizzBuzz extends App {
  // ...
  val graph = GraphDSL.create() { implicit builder =>
    // ...
    import GraphDSL.Implicits._
    rangeSource ~> fizzBuzzFlow ~> prefixFlow ~> suffixFlow ~> uppercaseFlow ~> printlnSink

    ClosedShape
  }

  RunnableGraph.fromGraph(graph)
    .run()
  // ...
}

FizzBuzz: Combine

  • PartialGraph can be linked to other graphs or shapes
object FizzBuzz extends App {
  // ...
  val graph = GraphDSL.create() { implicit builder =>
    // ...
    import GraphDSL.Implicits._
    SourceGraph.g ~> TransformGraph.g ~> sink

    ClosedShape
  }

  RunnableGraph.fromGraph(graph)
    .run()
  // ...
}

Fan-out

  • Broadcast[T]
    (1 input, N outputs)
  • Balance[T]
    (1 input, N outputs)
  • UnzipWith[In, A, B, ...]
    (1 input, N outputs)
  • UnZip[A, B]
    (1 input, 2 outputs)

Fan-in

  • Merge[In]
    (N inputs, 1 output)
  • MergePreferred[In]
    (N inputs, 1 output)
  • MergePrioritized[In]
    (N inputs, 1 output)
  • ZipWith[A, B, ...]
    (N inputs, 1 output)
  • Zip[A, B]
    (2 inputs, 1 output)
  • Concat[A]
    (2 inputs, 1 output)

FizzBuzz: Enhance!

  • Use predefined shapes to create complex flows
object FizzBuzz extends App {
  // ...
  val graph = GraphDSL.create() { implicit builder =>
    // ...
    import GraphDSL.Implicits._
    SourceGraph.g ~> TransformGraph.g ~> sink

    ClosedShape
  }

  RunnableGraph.fromGraph(graph)
    .run()
  // ...
}

Visual > Textual: Code

Visual > Textual: Graph

What's out there?

Current Solutions

  • Streaming Engine
  • Streaming Libraries
  • Streaming Applications
  • IoT
  • DSL
  • Data Pipeline
  • Online Machine Learning
  • Stream SQL
  • Toolkit
  • etc.

Java? (╯°Д°)╯︵ /(.□ . \)

Flow-Based Libraries

Can I write *even* less code?

  • JavaScript implementation of Flow-Based Programming

 

  • Web or NodeJs

 

  • Can be written in any language that transpiles into JavaScript
  • PaaS for real-time event streaming applications
  • Clojure and ClojureScript

Thanks!

Streaming Applications

By Herdy Handoko

Streaming Applications

Presentation on Akka Streams, prepared for GeekCamp Indonesia on 15 July 2017. Link: https://www.vidio.com/watch/793734-streaming-applications-with-akka-streams-herdy-handoko

  • 2,690