Understanding Monix Observable

Piotr Gawryś

About me

An open source contributor for fun
One of the maintainers of Monix
Kraków Scala User Group co-organizer

https://github.com/Avasil

twitter.com/p_gawrys

Monix

Scala / Scala.js library for asynchronous programming
Multiple modules exposing Task, IO[E, A], Observable, Iterant, Coeval, Local, and many concurrency primitives
Favors purely functional programming but provides for all
Big focus on being both Future, and

twitter.com/p_gawrys

Monix Observable

twitter.com/p_gawrys

Inspired by RxJava / ReactiveX
Push-based with back-pressure
See Alex's presentation for origins: https://monix.io/presentations/2018-tale-two-monix-streams.html
Cold (single subscriber) streams are purely functional

High Level Example

twitter.com/p_gawrys

val result: Task[Long] = 
  Observable.fromIterable(allElements)
    .bufferTumbling(bufferSize)
    .mapEval(seq => Task(seq.sum))
    .filter(_ > 0)
    .map(_.toLong)
    .foldLeftL(0L)(_ + _)

High Level Example

twitter.com/p_gawrys

val playerInputs: ConcurrentQueue[MovementCommand] = ???

val gameStateObservable: Observable[GameState] =
  Observable(initialState) ++
    Observable
      .repeatEvalF(playerInputs.poll) // get inputs
      .groupBy(_.playerId) // create a sub-stream per playerId
      // emit only the first element every 150.millis per sub-stream
      // and merge them concurrently to one stream
      .mergeMap(_.throttleFirst(150.millis)) 
      .bufferTimed(150.millis) // collect results every 150.millis
      .scan0(initialState) { // a state machine to update the latest GameState
        case (GameState(players, bullets, environment), commands) =>
          val (updatedPlayers, updatedBullets) =
            moveTank(players, bullets, environment, commands)
          val newGameState =
            resolveCollisions(
              GameState(updatedPlayers, updatedBullets, environment)
            )

          newGameState
      }

Today, we're going to talk about internals!

twitter.com/p_gawrys

Definition

twitter.com/p_gawrys

trait Observer[-A] {
  def onNext(elem: A): Future[Ack]

  def onError(ex: Throwable): Unit

  def onComplete(): Unit
}


// Needs some kind of ExecutionContext to do 
// anything with onNext (which returns Future)
trait Subscriber[-A] extends Observer[A] {
  implicit def scheduler: Scheduler
}

abstract class Observable[+A] {
  def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable
}

Observable

Observer

Observable

Observer

onNext

Observer#onNext protocol

twitter.com/p_gawrys

trait Observer[-A] {
  def onNext(elem: A): Future[Ack]
}

sealed abstract class Ack extends Future[Ack]
case object Continue extends Ack
case object Stop extends Ack

Grammar: onNext CAN be called zero, one or multiple times until onComplete, or onError
Back-pressure: each onNext call MUST wait on a Continue result
Cancellation: after receiving Stop the data-source MUST no longer send any events

Observer protocol

twitter.com/p_gawrys

trait Observer[-A] {
  def onError(ex: Throwable): Unit
  
  def onComplete(): Unit
}

Grammar: either onComplete or onError at most one time, can't call both.
Back-pressure: optional, not required to wait for the last onNext

Observer protocol

twitter.com/p_gawrys

trait Observer[-A] {
  def onNext(elem: A): Future[Ack]

  def onError(ex: Throwable): Unit

  def onComplete(): Unit
}

Ordering: all calls to onNext, onComplete, and onError MUST BE ordered and thus non-concurrent
Exceptions: it is not allowed to throw exceptions
Full contract here https://monix.io/docs/current/reactive/observers.html#contract

Observable

twitter.com/p_gawrys

Subscriber/Observer subscribes to Observable and it starts emitting events
subscribe returns Cancelable which allows to stop the computation from the outside

abstract class Observable[+A] {
  def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable
}

Simple Observable

twitter.com/p_gawrys

final class NowObservable[+A](elem: A) extends Observable[A] {
  def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable = {
    // No need to back-pressure for onComplete
    subscriber.onNext(elem)
    subscriber.onComplete()
    // There's no specific action needed in case the connection is canceled
    Cancelable.empty 
  }
}

twitter.com/p_gawrys

final class PrintSubscriber[-A] extends Subscriber[A] {
  override def scheduler: Scheduler = Scheduler.global

  override def onNext(elem: A): Future[Ack] = {
    println(s"Received $elem")
    Continue
  }

  override def onError(ex: Throwable): Unit = {
    println(s"Received error $ex")
  }

  override def onComplete(): Unit = {
    println(s"Received final event")
  }
}

Simple Subscriber

twitter.com/p_gawrys

val source: Observable[Int] = new NowObservable(10)

val cancelable: Cancelable =
  source.unsafeSubscribeFn(new PrintSubscriber)
  
// => Received 10
// => Received final event

Running Observable

twitter.com/p_gawrys

new Observable[Int] {
  def unsafeSubscribeFn(subscriber: Subscriber[Int]): Cancelable = {
    subscriber.onNext(elem)
    subscriber.onComplete()
    Cancelable.empty 
  }
}.unsafeSubscribeFn(new Subscriber[Int] {
  override def scheduler: Scheduler = Scheduler.global

  override def onNext(elem: A): Future[Ack] = {
    println(s"Received $elem")
    Continue
  }

  override def onError(ex: Throwable): Unit = {
    println(s"Received error $ex")
  }

  override def onComplete(): Unit = {
    println(s"Received final event")
  }
})

// => Received 10
// => Received final event

More complicated example

twitter.com/p_gawrys

import monix.eval.Task
import monix.reactive.Observable
import scala.concurrent.duration._
import scala.util.Random

val result: Task[List[Int]] =
  Observable.repeatEval(Random.nextInt(10))
    .takeByTimespan(10.second)
    .toListL

Observable.repeatEval

twitter.com/p_gawrys

object Observable {    
  def repeatEval[A](task: => A): Observable[A] =
    new RepeatEvalObservable(task)
}

final class RepeatEvalObservable[+A](eval: => A) extends Observable[A] {
  def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable = {
    val s = subscriber.scheduler
    val cancelable = BooleanCancelable()
    fastLoop(subscriber, cancelable, s.executionModel, 0)(s)
    cancelable
  }
  
  @tailrec
  def fastLoop(
    o: Subscriber[A], 
    // We might check it periodically to
    // see if the subscription is not cancelled
    c: BooleanCancelable, 
    // Scheduler has ExecutionModel, e.g. Synchronous, Batched, AlwaysAsync
    // We could add async boundaries according to it
    em: ExecutionModel,
    // BatchedExecution model inserts async boundary
    // after N synchronous operations
    syncIndex: Int
  )(implicit s: Scheduler): Unit = ???
}

twitter.com/p_gawrys

@tailrec
def fastLoop(
  o: Subscriber[A],
  c: BooleanCancelable, 
  em: ExecutionModel, 
  syncIndex: Int
)(implicit s: Scheduler): Unit = {
  val ack =
    try o.onNext(eval)
    catch {
      case ex if NonFatal(ex) =>
        Future.failed(ex)
    }

  val nextIndex =
    if (ack == Continue) em.nextFrameIndex(syncIndex)
    else if (ack == Stop) -1
    else 0

  if (nextIndex > 0)
    fastLoop(o, c, em, nextIndex)
  else if (nextIndex == 0 && !c.isCanceled)
    reschedule(ack, o, c, em)
}
    
def reschedule(
  ack: Future[Ack], 
  o: Subscriber[A], 
  c: BooleanCancelable, 
  em: ExecutionModel
)(implicit s: Scheduler): Unit = ???

def reschedule(
  ack: Future[Ack], 
  o: Subscriber[A], 
  c: BooleanCancelable, 
  em: ExecutionModel
)(implicit s: Scheduler): Unit =
  ack.onComplete {
    case Success(success) =>
      if (success == Continue) fastLoop(o, c, em, 0)
    case Failure(ex) =>
      s.reportFailure(ex)
    case _ => () // this was a Stop, do nothing
  }

twitter.com/p_gawrys

Observable#toListL

twitter.com/p_gawrys

abstract class Observable[+A] {    
  final def toListL: Task[List[A]] =
    foldLeft(mutable.ListBuffer.empty[A])(_ += _)
    // We know for sure that there will be only one element
    .firstOrElseL(mutable.ListBuffer.empty[A])
    .map(_.toList)
        
  final def foldLeft[R](seed: => R)(op: (R, A) => R): Observable[R] = ???
  
  final def firstOrElseL[B >: A](default: => B): Task[B] = ???
}

twitter.com/p_gawrys

final class FoldLeftObservable[A, R](
  source: Observable[A], 
  initial: () => R,
  f: (R, A) => R
) extends Observable[R] {
  def unsafeSubscribeFn(out: Subscriber[R]): Cancelable = {
    var streamErrors = true
    try {
      val initialState = initial()
      streamErrors = false

      source.unsafeSubscribeFn(new Subscriber[A] { ... })
    } catch {
      // If an error was thrown in source.unsafeSubscribeFn(...)
      // it is a breach of the protocol and the behavior is undefined
      // but we don't want to call out.onError in case it already happened there
      case NonFatal(ex) if streamErrors =>
        out.onError(ex)
        Cancelable.empty
    }
  }
}

      source.unsafeSubscribeFn(new Subscriber[A] {
        implicit val scheduler = out.scheduler
        // We might call onError in onNext so we need this 
        // flag to protect from potentially calling it twice
        // (once from onNext, once by upstream)
        private[this] var isDone = false
        private[this] var state: R = initialState

        def onNext(elem: A): Ack = {
          try {
            // User-supplied function
            // could throw exception
            state = f(state, elem)
            Continue
          } catch {
            case ex if NonFatal(ex) =>
              onError(ex)
              Stop
          }
        }

        def onComplete(): Unit =
          if (!isDone) {
              isDone = true
              out.onNext(state)
              out.onComplete()
            }

          def onError(ex: Throwable): Unit =
            if (!isDone) {
              isDone = true
              out.onError(ex)
            }
        })

Are those vars thread-safe?

twitter.com/p_gawrys

private[this] var isDone = false
private[this] var state: R = initialState

They can be modified and read from a different thread, after all...

Are those vars thread-safe?

twitter.com/p_gawrys

private[this] var isDone = false
private[this] var state: R = initialState

They can be modified and read from a different thread, after all...
but the protocol guarantees that and we'll see how!

Are those vars thread-safe?

twitter.com/p_gawrys

out.onNext(next).flatMap(_ => out2.onNext).flatMap(_ => out3.onNext) ...
out.onNext(next).flatMap(_ => out2.onNext).flatMap(_ => Continue) ...
out.onNext(next).flatMap(_ => Continue) ...
Continue

If we follow onNext calls, it goes like that:

And then the next element is sent after Continue is received (remember onComplete in repeatEval?)

Internally, each Future might be scheduled on a potentially different Thread with ec.execute():

Which establishes a happens-before relation between writing and reading isDone from potentially different threads.

var isDone = false

ec.execute(() => {
  isDone = true

  // second thread
  ec.execute(() => {
    assert(isDone)
  })
})

Observable#toListL

twitter.com/p_gawrys

abstract class Observable[+A] {    
  final def toListL: Task[List[A]] =
    foldLeft(mutable.ListBuffer.empty[A])(_ += _)
    // We know for sure that there will be only one element
    .firstOrElseL(mutable.ListBuffer.empty[A])
    .map(_.toList)
        
  final def foldLeft[R](seed: => R)(op: (R, A) => R): Observable[R] = 
    new FoldLeftObservable(source, seed, op)
  
  final def firstOrElseL[B >: A](default: => B): Task[B] = ???
}

Observable#firstOrElseL

final def firstOrElseL[B >: A](default: => B): Task[B] = 
  Task.create { (s, cb) =>
    unsafeSubscribeFn(new Subscriber[A] {
      implicit val scheduler: Scheduler = s
      private[this] var isDone = false

      def onNext(elem: A): Ack = {
        cb.onSuccess(elem)
        isDone = true
        Stop
      }

      def onError(ex: Throwable): Unit =
        if (!isDone) {
          isDone = true
          cb.onError(ex)
        }

      def onComplete(): Unit =
        if (!isDone) {
          isDone = true
          cb(Try(default))
        }
    })
  }

Observable#firstOrElseL Bonus!

final def firstOrElseLZIOOO[B >: A](default: => B): zio.Task[B] = {
  ZIO.descriptorWith { desc =>
    ZIO.effectAsync { cb =>
      unsafeSubscribeFn(new Subscriber[A] {
        implicit val scheduler: Scheduler =
          Scheduler(desc.executor.asEC)

        private[this] var isDone = false

        def onNext(elem: A): Ack = {
          cb(ZIO.succeed(elem))
          isDone = true
          Stop
        }

        def onError(ex: Throwable): Unit =
          if (!isDone) {
            isDone = true
            cb(ZIO.fail(ex))
          }

        def onComplete(): Unit =
          if (!isDone) {
            isDone = true
            cb(ZIO(default))
          }
      })
    }
  }
}

TakeLeftByTimespanObservable

twitter.com/p_gawrys

abstract class Observable[+A] {    
  final def takeByTimespan(timespan: FiniteDuration): Observable[A] =
    new TakeLeftByTimespanObservable(this, timespan)
}

Takes the elements until timespan passes
We could run source as usual but run it concurrently with a timeoutTask that will stop the source gracefully

final class TakeLeftByTimespanObservable[A](
  source: Observable[A], 
  timespan: FiniteDuration
) extends Observable[A] {

  def unsafeSubscribeFn(out: Subscriber[A]): Cancelable = {
    val composite = CompositeCancelable()

    composite += source.unsafeSubscribeFn(new Subscriber[A] with Runnable {
      implicit val scheduler = out.scheduler
      private[this] var isActive = true
      private[this] val timeoutTask: Cancelable = {
        val ref = scheduler.scheduleOnce(timespan.length, timespan.unit, this)
        composite += ref
        ref
      }

      def run(): Unit = onComplete()

      private def deactivate(): Unit = synchronized {
        isActive = false
        timeoutTask.cancel()
      }

      def onNext(elem: A): Future[Ack] = synchronized {
        if (isActive) out.onNext(elem).syncOnStopOrFailure(_ => deactivate())
        else Stop
      }

      def onError(ex: Throwable): Unit = synchronized {
        if (isActive) {
          deactivate()
          out.onError(ex)
        }
      }

      def onComplete(): Unit = synchronized {
        if (isActive) {
          deactivate()
          out.onComplete()
        }
      }
    })
  }
}

Notable implementation details

Subscriber extends Runnable - common optimization to minimize allocations
Access to isActive flag is synchronized because onComplete can be called from an asynchronous timeoutTask
timeoutTask is added to the subscription cancelable
syncOnStopOrFailure optimization

twitter.com/p_gawrys

syncOnStopOrFailure

// F-bounded polymorphism, see 
// https://github.com/ghik/opinionated-scala/blob/master/chapters/Generics-and-type-members.md#f-bounded-polymorphism
implicit class AckExtensions[Self <: Future[Ack]](val source: Self) extends AnyVal {

  def syncOnStopOrFailure(
    cb: Option[Throwable] => Unit
  )(implicit r: UncaughtExceptionReporter): Self = {
    if (source eq Stop)
      try cb(None)
      catch { case e if NonFatal(e) => r.reportFailure(e) }
    else if (source ne Continue)
      source.onComplete { ack =>
        try ack match {
          case Success(Stop) => cb(None)
          case Failure(e) => cb(Some(e))
          case _ => ()
        } catch {
            case e if NonFatal(e) => r.reportFailure(e)
          }
      }(immediate)
      
    source
  }
}

Complete example

twitter.com/p_gawrys

val result: Task[List[Int]] =
  Observable.repeatEval(Random.nextInt(10))
    .takeByTimespan(10.second)
    .toListL

Could be inlined to:

Task.create { (s, cb) =>
  val source = 
    new FoldLeftObservable(
      new TakeLeftByTimespanObservable(
        new RepeatEvalObservable(Random.nextInt(10)), 
        10.second
      ), 
      mutable.ListBuffer.empty[Int]
      )(_ += _).firstOrElse().map(_.toList)
    
  source.unsafeSubscribeFn(new Subscriber[A] {
      implicit val scheduler: Scheduler = s
      private[this] var isDone = false

      def onNext(elem: A): Ack = {
        cb.onSuccess(elem)
        isDone = true
        Stop
      }

      def onError(ex: Throwable): Unit =
        if (!isDone) {
          isDone = true
          cb.onError(ex)
        }

      def onComplete(): Unit =
        if (!isDone) {
          isDone = true
          cb(Try(default))
        }
    })
    .map(_.toList)
}

What we didn't cover

Subject (both Observable and Observer)
BufferedSubscriber
Hot Observable (sharing one Observable between multiple Subscribers)

twitter.com/p_gawrys

Benchmarks

I'm about to show few micro-benchmarks.

Please, keep in mind that the results can be misleading - it's best to measure for your specific use case.

API/Ecosystem/Familiarity is usually better criteria, as long as the library meets the minimum performance requirements.

twitter.com/p_gawrys

ChunkedMapFilterSum

twitter.com/p_gawrys

def monixObservableNoChunks(): Int = {
  val stream = Observable
    .fromIterable(allElements)
    .map(_ + 1)
    .filter(_ % 2 == 0)

  sum(stream)
}

def akkaStreamNoChunks(): Long = {
  val stream = AkkaSource
    .fromIterator(() => allElements.iterator)
    .map(_ + 1)
    .filter(_ % 2 == 0)
    .toMat(AkkaSink.fold(0L)(_ + _))(Keep.right)

  Await.result(stream.run(), Duration.Inf)
}

def zioStream(): Int = {
  val stream = ZStream
    .fromChunks(zioChunks: _*)
    .map(_ + 1)
    .filter(_ % 2 == 0)
    .runSum

  zioUntracedRuntime.unsafeRun(stream)
}

def fs2Stream(): Int = {
  val stream = FS2Stream(fs2Chunks: _*)
    .flatMap(FS2Stream.chunk)
    .map(_ + 1)
    .filter(_ % 2 == 0)
    .compile
    .fold(0)(_ + _)

  stream
}

ChunkedMapFilterSum

twitter.com/p_gawrys

[info] Benchmark  (chunkCount)  (chunkSize)   Mode  Cnt   Score   Error  Units
[info] akka               1000         1000  thrpt   20  10.866 ± 0.059  ops/s
[info] fs2                1000         1000  thrpt   20  55.301 ± 0.495  ops/s
[info] monix              1000         1000  thrpt   20  95.506 ± 0.241  ops/s
[info] zio                1000         1000  thrpt   20  32.106 ± 0.228  ops/s

MapAccumulate

twitter.com/p_gawrys

def monixMapAccumulate() = {
  Observable
    .fromIterable(0 until n)
    .mapAccumulate(0) { case (acc, i) =>
      val added = acc + i
      (added, added)
    }
    .completedL
    .runSyncUnsafe()
}

def zioMapAccumulate() = {
  val stream = ZStream
    .fromIterable(0 until n)
    .mapAccum(0) { case (acc, i) =>
      val added = acc + i
      (added, added)
    }
    .runDrain

  zioUntracedRuntime.unsafeRun(stream)
}

def fs2MapAccumulate() = {
  FS2Stream
    .emits(0 until n)
    .mapAccumulate(0) { case (acc, i) =>
      val added = acc + i
      (added, added)
    }
    .compile
    .drain
}

MapAccumulate

twitter.com/p_gawrys

[info] Benchmark  (n)   Mode  Cnt      Score     Error  Units
[info] fs2       1000  thrpt   20  66490.570 ± 211.840  ops/s
[info] fs2      10000  thrpt   20   8241.498 ±  52.588  ops/s
[info] monix     1000  thrpt   20  99300.153 ± 619.293  ops/s
[info] monix    10000  thrpt   20  10539.976 ± 203.321  ops/s
[info] zio       1000  thrpt   20   1819.379 ±  16.974  ops/s
[info] zio      10000  thrpt   20    201.752 ±   2.983  ops/s

ChunkedEvalFilterMapSum

twitter.com/p_gawrys

def fs2Stream = {
  val stream = FS2Stream
    .apply(allElements: _*)
    .chunkN(chunkSize)
    .evalMap[MonixTask, Int](chunk => MonixTask(sumIntScala(chunk.iterator)))
    .filter(_ > 0)
    .map(_.toLong)
    .compile
    .fold(0L)(_ + _)
}

def fs2StreamPreChunked = {
  val stream = FS2Stream(fs2Chunks: _*)
    .evalMap[MonixTask, Int](chunk => MonixTask(sumIntScala(chunk.iterator)))
    .filter(_ > 0)
    .map(_.toLong)
    .compile
    .fold(0L)(_ + _)
}

ChunkedEvalFilterMapSum

twitter.com/p_gawrys

[info] Benchmark (chunkCount)  (chunkSize)   Mode  Cnt    Score   Error  Units
[info] akka              1000         1000  thrpt   20   17.120 ± 0.418  ops/s
[info] akkaPreChunked    1000         1000  thrpt   20  214.725 ± 1.147  ops/s
[info] fs2               1000         1000  thrpt   20   63.284 ± 1.243  ops/s
[info] fs2PreChunked     1000         1000  thrpt   20  169.957 ± 7.040  ops/s
[info] monix             1000         1000  thrpt   20   77.922 ± 2.219  ops/s
[info] monixPreChunked   1000         1000  thrpt   20  364.595 ± 1.009  ops/s
[info] zio               1000         1000  thrpt   20  122.227 ± 5.387  ops/s
[info] zioPreChunked     1000         1000  thrpt   20  121.596 ± 2.566  ops/s

Tradeoffs

Cons:

Pure API, Dirty Internals - individual operators are hard to reason about in comparison to higher-level implementations of fs2/zio
Push Model - if you want to maximize throughput, you need to use buffers yourself
Shared Data Sources are not purely functional
Current implementation of flatMap is not stack-safe

twitter.com/p_gawrys

Tradeoffs

Pros:

Pure API, Dirty Internals - nice API and best-in-class performance
Push Model - awesome for latency and time-based operators
Effect independent - Observable is fully capable of executing on its own, without any overhead of going through Task/IO Run-Loop, and could support all effect types natively
ReactiveX based - tons of related resources and a perfect step into FP for people coming from Java/JS :)

twitter.com/p_gawrys

Final words

If you have any questions or more ideas, make sure to let us know at https://github.com/monix/monix or https://gitter.im/monix/monix
Recently, I've released https://github.com/monix/monix-bio - Cats-Effect friendly IO[E, A] implementation
Contributions are very welcome!
... Thank you for being here :)

twitter.com/p_gawrys

Understanding Monix Observable

By Piotr Gawryś

Understanding Monix Observable

1,265

Piotr Gawryś

p_gawrys

Understanding Monix Observable

About me

Monix

Monix Observable

High Level Example

High Level Example

Today, we're going to talk about internals!

Definition

Observer#onNext protocol

Observer protocol

Observer protocol

Observable

Simple Observable

Simple Subscriber

Running Observable

More complicated example

Observable.repeatEval

Observable#toListL

Are those vars thread-safe?

Are those vars thread-safe?

Are those vars thread-safe?

Observable#toListL

Observable#firstOrElseL

Observable#firstOrElseL Bonus!

TakeLeftByTimespanObservable

Notable implementation details

syncOnStopOrFailure

Complete example

What we didn't cover

Benchmarks

ChunkedMapFilterSum

ChunkedMapFilterSum

MapAccumulate

MapAccumulate

ChunkedEvalFilterMapSum

ChunkedEvalFilterMapSum

Tradeoffs

Tradeoffs

Final words

Understanding Monix Observable

More from Piotr Gawryś