Understanding Monix Observable
Piotr Gawryś
About me
- An open source contributor for fun
- One of the maintainers of Monix
- Kraków Scala User Group co-organizer
https://github.com/Avasil
twitter.com/p_gawrys
Monix
- Scala / Scala.js library for asynchronous programming
- Multiple modules exposing Task, IO[E, A], Observable, Iterant, Coeval, Local, and many concurrency primitives
- Favors purely functional programming but provides for all
- Big focus on being both Future, and
twitter.com/p_gawrys
Monix Observable
twitter.com/p_gawrys
- Inspired by RxJava / ReactiveX
- Push-based with back-pressure
- See Alex's presentation for origins: https://monix.io/presentations/2018-tale-two-monix-streams.html
- Cold (single subscriber) streams are purely functional
High Level Example
twitter.com/p_gawrys
val result: Task[Long] =
Observable.fromIterable(allElements)
.bufferTumbling(bufferSize)
.mapEval(seq => Task(seq.sum))
.filter(_ > 0)
.map(_.toLong)
.foldLeftL(0L)(_ + _)
High Level Example
twitter.com/p_gawrys
val playerInputs: ConcurrentQueue[MovementCommand] = ???
val gameStateObservable: Observable[GameState] =
Observable(initialState) ++
Observable
.repeatEvalF(playerInputs.poll) // get inputs
.groupBy(_.playerId) // create a sub-stream per playerId
// emit only the first element every 150.millis per sub-stream
// and merge them concurrently to one stream
.mergeMap(_.throttleFirst(150.millis))
.bufferTimed(150.millis) // collect results every 150.millis
.scan0(initialState) { // a state machine to update the latest GameState
case (GameState(players, bullets, environment), commands) =>
val (updatedPlayers, updatedBullets) =
moveTank(players, bullets, environment, commands)
val newGameState =
resolveCollisions(
GameState(updatedPlayers, updatedBullets, environment)
)
newGameState
}
Today, we're going to talk about internals!
twitter.com/p_gawrys
Definition
twitter.com/p_gawrys
trait Observer[-A] {
def onNext(elem: A): Future[Ack]
def onError(ex: Throwable): Unit
def onComplete(): Unit
}
// Needs some kind of ExecutionContext to do
// anything with onNext (which returns Future)
trait Subscriber[-A] extends Observer[A] {
implicit def scheduler: Scheduler
}
abstract class Observable[+A] {
def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable
}
Observable
Observer
Observer
subscribe
subscribe
Observable
Observer
Observer
onNext
onNext
Observer#onNext protocol
twitter.com/p_gawrys
trait Observer[-A] {
def onNext(elem: A): Future[Ack]
}
sealed abstract class Ack extends Future[Ack]
case object Continue extends Ack
case object Stop extends Ack
- Grammar: onNext CAN be called zero, one or multiple times until onComplete, or onError
- Back-pressure: each onNext call MUST wait on a Continue result
- Cancellation: after receiving Stop the data-source MUST no longer send any events
Observer protocol
twitter.com/p_gawrys
trait Observer[-A] {
def onError(ex: Throwable): Unit
def onComplete(): Unit
}
- Grammar: either onComplete or onError at most one time, can't call both.
- Back-pressure: optional, not required to wait for the last onNext
Observer protocol
twitter.com/p_gawrys
trait Observer[-A] {
def onNext(elem: A): Future[Ack]
def onError(ex: Throwable): Unit
def onComplete(): Unit
}
- Ordering: all calls to onNext, onComplete, and onError MUST BE ordered and thus non-concurrent
- Exceptions: it is not allowed to throw exceptions
- Full contract here https://monix.io/docs/current/reactive/observers.html#contract
Observable
twitter.com/p_gawrys
- Subscriber/Observer subscribes to Observable and it starts emitting events
- subscribe returns Cancelable which allows to stop the computation from the outside
abstract class Observable[+A] {
def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable
}
Simple Observable
twitter.com/p_gawrys
final class NowObservable[+A](elem: A) extends Observable[A] {
def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable = {
// No need to back-pressure for onComplete
subscriber.onNext(elem)
subscriber.onComplete()
// There's no specific action needed in case the connection is canceled
Cancelable.empty
}
}
twitter.com/p_gawrys
final class PrintSubscriber[-A] extends Subscriber[A] {
override def scheduler: Scheduler = Scheduler.global
override def onNext(elem: A): Future[Ack] = {
println(s"Received $elem")
Continue
}
override def onError(ex: Throwable): Unit = {
println(s"Received error $ex")
}
override def onComplete(): Unit = {
println(s"Received final event")
}
}
Simple Subscriber
twitter.com/p_gawrys
val source: Observable[Int] = new NowObservable(10)
val cancelable: Cancelable =
source.unsafeSubscribeFn(new PrintSubscriber)
// => Received 10
// => Received final event
Running Observable
twitter.com/p_gawrys
new Observable[Int] {
def unsafeSubscribeFn(subscriber: Subscriber[Int]): Cancelable = {
subscriber.onNext(elem)
subscriber.onComplete()
Cancelable.empty
}
}.unsafeSubscribeFn(new Subscriber[Int] {
override def scheduler: Scheduler = Scheduler.global
override def onNext(elem: A): Future[Ack] = {
println(s"Received $elem")
Continue
}
override def onError(ex: Throwable): Unit = {
println(s"Received error $ex")
}
override def onComplete(): Unit = {
println(s"Received final event")
}
})
// => Received 10
// => Received final event
More complicated example
twitter.com/p_gawrys
import monix.eval.Task
import monix.reactive.Observable
import scala.concurrent.duration._
import scala.util.Random
val result: Task[List[Int]] =
Observable.repeatEval(Random.nextInt(10))
.takeByTimespan(10.second)
.toListL
Observable.repeatEval
twitter.com/p_gawrys
object Observable {
def repeatEval[A](task: => A): Observable[A] =
new RepeatEvalObservable(task)
}
final class RepeatEvalObservable[+A](eval: => A) extends Observable[A] {
def unsafeSubscribeFn(subscriber: Subscriber[A]): Cancelable = {
val s = subscriber.scheduler
val cancelable = BooleanCancelable()
fastLoop(subscriber, cancelable, s.executionModel, 0)(s)
cancelable
}
@tailrec
def fastLoop(
o: Subscriber[A],
// We might check it periodically to
// see if the subscription is not cancelled
c: BooleanCancelable,
// Scheduler has ExecutionModel, e.g. Synchronous, Batched, AlwaysAsync
// We could add async boundaries according to it
em: ExecutionModel,
// BatchedExecution model inserts async boundary
// after N synchronous operations
syncIndex: Int
)(implicit s: Scheduler): Unit = ???
}
twitter.com/p_gawrys
@tailrec
def fastLoop(
o: Subscriber[A],
c: BooleanCancelable,
em: ExecutionModel,
syncIndex: Int
)(implicit s: Scheduler): Unit = {
val ack =
try o.onNext(eval)
catch {
case ex if NonFatal(ex) =>
Future.failed(ex)
}
val nextIndex =
if (ack == Continue) em.nextFrameIndex(syncIndex)
else if (ack == Stop) -1
else 0
if (nextIndex > 0)
fastLoop(o, c, em, nextIndex)
else if (nextIndex == 0 && !c.isCanceled)
reschedule(ack, o, c, em)
}
def reschedule(
ack: Future[Ack],
o: Subscriber[A],
c: BooleanCancelable,
em: ExecutionModel
)(implicit s: Scheduler): Unit = ???
def reschedule(
ack: Future[Ack],
o: Subscriber[A],
c: BooleanCancelable,
em: ExecutionModel
)(implicit s: Scheduler): Unit =
ack.onComplete {
case Success(success) =>
if (success == Continue) fastLoop(o, c, em, 0)
case Failure(ex) =>
s.reportFailure(ex)
case _ => () // this was a Stop, do nothing
}
twitter.com/p_gawrys
Observable#toListL
twitter.com/p_gawrys
abstract class Observable[+A] {
final def toListL: Task[List[A]] =
foldLeft(mutable.ListBuffer.empty[A])(_ += _)
// We know for sure that there will be only one element
.firstOrElseL(mutable.ListBuffer.empty[A])
.map(_.toList)
final def foldLeft[R](seed: => R)(op: (R, A) => R): Observable[R] = ???
final def firstOrElseL[B >: A](default: => B): Task[B] = ???
}
twitter.com/p_gawrys
final class FoldLeftObservable[A, R](
source: Observable[A],
initial: () => R,
f: (R, A) => R
) extends Observable[R] {
def unsafeSubscribeFn(out: Subscriber[R]): Cancelable = {
var streamErrors = true
try {
val initialState = initial()
streamErrors = false
source.unsafeSubscribeFn(new Subscriber[A] { ... })
} catch {
// If an error was thrown in source.unsafeSubscribeFn(...)
// it is a breach of the protocol and the behavior is undefined
// but we don't want to call out.onError in case it already happened there
case NonFatal(ex) if streamErrors =>
out.onError(ex)
Cancelable.empty
}
}
}
source.unsafeSubscribeFn(new Subscriber[A] {
implicit val scheduler = out.scheduler
// We might call onError in onNext so we need this
// flag to protect from potentially calling it twice
// (once from onNext, once by upstream)
private[this] var isDone = false
private[this] var state: R = initialState
def onNext(elem: A): Ack = {
try {
// User-supplied function
// could throw exception
state = f(state, elem)
Continue
} catch {
case ex if NonFatal(ex) =>
onError(ex)
Stop
}
}
def onComplete(): Unit =
if (!isDone) {
isDone = true
out.onNext(state)
out.onComplete()
}
def onError(ex: Throwable): Unit =
if (!isDone) {
isDone = true
out.onError(ex)
}
})
Are those vars thread-safe?
twitter.com/p_gawrys
private[this] var isDone = false
private[this] var state: R = initialState
- They can be modified and read from a different thread, after all...
Are those vars thread-safe?
twitter.com/p_gawrys
private[this] var isDone = false
private[this] var state: R = initialState
- They can be modified and read from a different thread, after all...
- but the protocol guarantees that and we'll see how!
Are those vars thread-safe?
twitter.com/p_gawrys
out.onNext(next).flatMap(_ => out2.onNext).flatMap(_ => out3.onNext) ...
out.onNext(next).flatMap(_ => out2.onNext).flatMap(_ => Continue) ...
out.onNext(next).flatMap(_ => Continue) ...
Continue
If we follow onNext calls, it goes like that:
And then the next element is sent after Continue is received (remember onComplete in repeatEval?)
Internally, each Future might be scheduled on a potentially different Thread with ec.execute():
Which establishes a happens-before relation between writing and reading isDone from potentially different threads.
var isDone = false
ec.execute(() => {
isDone = true
// second thread
ec.execute(() => {
assert(isDone)
})
})
Observable#toListL
twitter.com/p_gawrys
abstract class Observable[+A] {
final def toListL: Task[List[A]] =
foldLeft(mutable.ListBuffer.empty[A])(_ += _)
// We know for sure that there will be only one element
.firstOrElseL(mutable.ListBuffer.empty[A])
.map(_.toList)
final def foldLeft[R](seed: => R)(op: (R, A) => R): Observable[R] =
new FoldLeftObservable(source, seed, op)
final def firstOrElseL[B >: A](default: => B): Task[B] = ???
}
Observable#firstOrElseL
final def firstOrElseL[B >: A](default: => B): Task[B] =
Task.create { (s, cb) =>
unsafeSubscribeFn(new Subscriber[A] {
implicit val scheduler: Scheduler = s
private[this] var isDone = false
def onNext(elem: A): Ack = {
cb.onSuccess(elem)
isDone = true
Stop
}
def onError(ex: Throwable): Unit =
if (!isDone) {
isDone = true
cb.onError(ex)
}
def onComplete(): Unit =
if (!isDone) {
isDone = true
cb(Try(default))
}
})
}
Observable#firstOrElseL Bonus!
final def firstOrElseLZIOOO[B >: A](default: => B): zio.Task[B] = {
ZIO.descriptorWith { desc =>
ZIO.effectAsync { cb =>
unsafeSubscribeFn(new Subscriber[A] {
implicit val scheduler: Scheduler =
Scheduler(desc.executor.asEC)
private[this] var isDone = false
def onNext(elem: A): Ack = {
cb(ZIO.succeed(elem))
isDone = true
Stop
}
def onError(ex: Throwable): Unit =
if (!isDone) {
isDone = true
cb(ZIO.fail(ex))
}
def onComplete(): Unit =
if (!isDone) {
isDone = true
cb(ZIO(default))
}
})
}
}
}
TakeLeftByTimespanObservable
twitter.com/p_gawrys
abstract class Observable[+A] {
final def takeByTimespan(timespan: FiniteDuration): Observable[A] =
new TakeLeftByTimespanObservable(this, timespan)
}
- Takes the elements until timespan passes
- We could run source as usual but run it concurrently with a timeoutTask that will stop the source gracefully
final class TakeLeftByTimespanObservable[A](
source: Observable[A],
timespan: FiniteDuration
) extends Observable[A] {
def unsafeSubscribeFn(out: Subscriber[A]): Cancelable = {
val composite = CompositeCancelable()
composite += source.unsafeSubscribeFn(new Subscriber[A] with Runnable {
implicit val scheduler = out.scheduler
private[this] var isActive = true
private[this] val timeoutTask: Cancelable = {
val ref = scheduler.scheduleOnce(timespan.length, timespan.unit, this)
composite += ref
ref
}
def run(): Unit = onComplete()
private def deactivate(): Unit = synchronized {
isActive = false
timeoutTask.cancel()
}
def onNext(elem: A): Future[Ack] = synchronized {
if (isActive) out.onNext(elem).syncOnStopOrFailure(_ => deactivate())
else Stop
}
def onError(ex: Throwable): Unit = synchronized {
if (isActive) {
deactivate()
out.onError(ex)
}
}
def onComplete(): Unit = synchronized {
if (isActive) {
deactivate()
out.onComplete()
}
}
})
}
}
Notable implementation details
- Subscriber extends Runnable - common optimization to minimize allocations
- Access to isActive flag is synchronized because onComplete can be called from an asynchronous timeoutTask
- timeoutTask is added to the subscription cancelable
- syncOnStopOrFailure optimization
twitter.com/p_gawrys
syncOnStopOrFailure
// F-bounded polymorphism, see
// https://github.com/ghik/opinionated-scala/blob/master/chapters/Generics-and-type-members.md#f-bounded-polymorphism
implicit class AckExtensions[Self <: Future[Ack]](val source: Self) extends AnyVal {
def syncOnStopOrFailure(
cb: Option[Throwable] => Unit
)(implicit r: UncaughtExceptionReporter): Self = {
if (source eq Stop)
try cb(None)
catch { case e if NonFatal(e) => r.reportFailure(e) }
else if (source ne Continue)
source.onComplete { ack =>
try ack match {
case Success(Stop) => cb(None)
case Failure(e) => cb(Some(e))
case _ => ()
} catch {
case e if NonFatal(e) => r.reportFailure(e)
}
}(immediate)
source
}
}
Complete example
twitter.com/p_gawrys
val result: Task[List[Int]] =
Observable.repeatEval(Random.nextInt(10))
.takeByTimespan(10.second)
.toListL
Could be inlined to:
Task.create { (s, cb) =>
val source =
new FoldLeftObservable(
new TakeLeftByTimespanObservable(
new RepeatEvalObservable(Random.nextInt(10)),
10.second
),
mutable.ListBuffer.empty[Int]
)(_ += _).firstOrElse().map(_.toList)
source.unsafeSubscribeFn(new Subscriber[A] {
implicit val scheduler: Scheduler = s
private[this] var isDone = false
def onNext(elem: A): Ack = {
cb.onSuccess(elem)
isDone = true
Stop
}
def onError(ex: Throwable): Unit =
if (!isDone) {
isDone = true
cb.onError(ex)
}
def onComplete(): Unit =
if (!isDone) {
isDone = true
cb(Try(default))
}
})
.map(_.toList)
}
What we didn't cover
- Subject (both Observable and Observer)
- BufferedSubscriber
- Hot Observable (sharing one Observable between multiple Subscribers)
twitter.com/p_gawrys
Benchmarks
I'm about to show few micro-benchmarks.
Please, keep in mind that the results can be misleading - it's best to measure for your specific use case.
API/Ecosystem/Familiarity is usually better criteria, as long as the library meets the minimum performance requirements.
twitter.com/p_gawrys
ChunkedMapFilterSum
twitter.com/p_gawrys
def monixObservableNoChunks(): Int = {
val stream = Observable
.fromIterable(allElements)
.map(_ + 1)
.filter(_ % 2 == 0)
sum(stream)
}
def akkaStreamNoChunks(): Long = {
val stream = AkkaSource
.fromIterator(() => allElements.iterator)
.map(_ + 1)
.filter(_ % 2 == 0)
.toMat(AkkaSink.fold(0L)(_ + _))(Keep.right)
Await.result(stream.run(), Duration.Inf)
}
def zioStream(): Int = {
val stream = ZStream
.fromChunks(zioChunks: _*)
.map(_ + 1)
.filter(_ % 2 == 0)
.runSum
zioUntracedRuntime.unsafeRun(stream)
}
def fs2Stream(): Int = {
val stream = FS2Stream(fs2Chunks: _*)
.flatMap(FS2Stream.chunk)
.map(_ + 1)
.filter(_ % 2 == 0)
.compile
.fold(0)(_ + _)
stream
}
ChunkedMapFilterSum
twitter.com/p_gawrys
[info] Benchmark (chunkCount) (chunkSize) Mode Cnt Score Error Units
[info] akka 1000 1000 thrpt 20 10.866 ± 0.059 ops/s
[info] fs2 1000 1000 thrpt 20 55.301 ± 0.495 ops/s
[info] monix 1000 1000 thrpt 20 95.506 ± 0.241 ops/s
[info] zio 1000 1000 thrpt 20 32.106 ± 0.228 ops/s
MapAccumulate
twitter.com/p_gawrys
def monixMapAccumulate() = {
Observable
.fromIterable(0 until n)
.mapAccumulate(0) { case (acc, i) =>
val added = acc + i
(added, added)
}
.completedL
.runSyncUnsafe()
}
def zioMapAccumulate() = {
val stream = ZStream
.fromIterable(0 until n)
.mapAccum(0) { case (acc, i) =>
val added = acc + i
(added, added)
}
.runDrain
zioUntracedRuntime.unsafeRun(stream)
}
def fs2MapAccumulate() = {
FS2Stream
.emits(0 until n)
.mapAccumulate(0) { case (acc, i) =>
val added = acc + i
(added, added)
}
.compile
.drain
}
MapAccumulate
twitter.com/p_gawrys
[info] Benchmark (n) Mode Cnt Score Error Units
[info] fs2 1000 thrpt 20 66490.570 ± 211.840 ops/s
[info] fs2 10000 thrpt 20 8241.498 ± 52.588 ops/s
[info] monix 1000 thrpt 20 99300.153 ± 619.293 ops/s
[info] monix 10000 thrpt 20 10539.976 ± 203.321 ops/s
[info] zio 1000 thrpt 20 1819.379 ± 16.974 ops/s
[info] zio 10000 thrpt 20 201.752 ± 2.983 ops/s
ChunkedEvalFilterMapSum
twitter.com/p_gawrys
def fs2Stream = {
val stream = FS2Stream
.apply(allElements: _*)
.chunkN(chunkSize)
.evalMap[MonixTask, Int](chunk => MonixTask(sumIntScala(chunk.iterator)))
.filter(_ > 0)
.map(_.toLong)
.compile
.fold(0L)(_ + _)
}
def fs2StreamPreChunked = {
val stream = FS2Stream(fs2Chunks: _*)
.evalMap[MonixTask, Int](chunk => MonixTask(sumIntScala(chunk.iterator)))
.filter(_ > 0)
.map(_.toLong)
.compile
.fold(0L)(_ + _)
}
ChunkedEvalFilterMapSum
twitter.com/p_gawrys
[info] Benchmark (chunkCount) (chunkSize) Mode Cnt Score Error Units
[info] akka 1000 1000 thrpt 20 17.120 ± 0.418 ops/s
[info] akkaPreChunked 1000 1000 thrpt 20 214.725 ± 1.147 ops/s
[info] fs2 1000 1000 thrpt 20 63.284 ± 1.243 ops/s
[info] fs2PreChunked 1000 1000 thrpt 20 169.957 ± 7.040 ops/s
[info] monix 1000 1000 thrpt 20 77.922 ± 2.219 ops/s
[info] monixPreChunked 1000 1000 thrpt 20 364.595 ± 1.009 ops/s
[info] zio 1000 1000 thrpt 20 122.227 ± 5.387 ops/s
[info] zioPreChunked 1000 1000 thrpt 20 121.596 ± 2.566 ops/s
Tradeoffs
Cons:
- Pure API, Dirty Internals - individual operators are hard to reason about in comparison to higher-level implementations of fs2/zio
- Push Model - if you want to maximize throughput, you need to use buffers yourself
- Shared Data Sources are not purely functional
- Current implementation of flatMap is not stack-safe
twitter.com/p_gawrys
Tradeoffs
Pros:
- Pure API, Dirty Internals - nice API and best-in-class performance
- Push Model - awesome for latency and time-based operators
- Effect independent - Observable is fully capable of executing on its own, without any overhead of going through Task/IO Run-Loop, and could support all effect types natively
- ReactiveX based - tons of related resources and a perfect step into FP for people coming from Java/JS :)
twitter.com/p_gawrys
Final words
- If you have any questions or more ideas, make sure to let us know at https://github.com/monix/monix or https://gitter.im/monix/monix
- Recently, I've released https://github.com/monix/monix-bio - Cats-Effect friendly IO[E, A] implementation
- Contributions are very welcome!
- ... Thank you for being here :)
twitter.com/p_gawrys
Understanding Monix Observable
By Piotr Gawryś
Understanding Monix Observable
- 1,265