Functional Concurrency in Scala 101

Piotr Gawryś

Who am I?

  • One of the maintainers of Monix
  • Contributor to Typelevel ecosystem
  • Kraków Scala User Group co-organizer
    (let me know if you'd like to speak!)
  • Super excited to be here!

https://github.com/Avasil

twitter.com/p_gawrys

What's the talk about

  • Concurrency fundamentals on JVM
  • Focused around purely functional programming (Cats-Effect, FS2, Monix, ZIO)

Concurrency

  • Processing multiple tasks interleaved
  • Doesn't require more than one thread to work

Concurrency

Parallelism

  • Executing multiple tasks at the same time to finish them faster

Parallelism

Threads

  • Basic unit of CPU utilization
  • Typically part of Operating System
  • Threads can share memory
  • Processors can usually run up to 1 thread per CPU's core at the same time

Threads on JVM

  • Map 1:1 to native OS threads
  • Each thread takes around 1 MB of memory on 64-bit systems by default

JVM Memory Model

Heap

Thread Stack

Objects

call stack

local variables

Thread Stack

call stack

local variables

Thread Stack

call stack

local variables

CPU

CPU's Core 1

Main Memory

registers

CPU

Cache

JVM

CPU's Core 3

registers

CPU

Cache

CPU's Core 2

registers

CPU

Cache

Context Switch

  • Happens when new thread starts working on CPU's core
  • OS needs to store the state of old task and restore the state of the new one
  • Cache locality is lost
  • Synchronous => no context switches => best performance

Context Switch

Thread Pools

  • Take care of managing threads for us
  • Can reuse threads
  • Several types, e.g. Cached, SingleThreaded, WorkStealing etc.
  • Think ExecutionContext (Future), ContextShift (cats.effect.IO), Scheduler (Monix Task)

Thread Pools

import java.util.concurrent.Executors
import monix.execution.Scheduler
import scala.concurrent.ExecutionContext

val ec = ExecutionContext.fromExecutor(Executors.newCachedThreadPool())
val scheduler = Scheduler(ec)

Extra capabilities

import monix.execution.schedulers.TestScheduler
import scala.concurrent.duration._
import cats.implicits._

val sc = TestScheduler()

val failedTask: Task[Int] = Task.sleep(2.days) >> 
  Task.raiseError[Int](new Exception("boom"))

val f: CancelableFuture[Int] = failedTask.runToFuture(sc)
  
println(f.value) // None
sc.tick(10.hours)
println(f.value) // None
sc.tick(1000.days)
println(f.value) // Some(Failure(java.lang.Exception: boom))

Asynchronous Boundary

  • Task returns to Thread Pool to be scheduled again
  • Many scenarios can occur, e.g.:
    • Cancelation
    • Another Task starts executing
    • The same Task executes on the same thread it was before
    • New Thread is created
    • and more!

Asynchronous Boundary

val s: Scheduler = Scheduler.global

def repeat: Task[Unit] =
  for {
    _ <- Task.shift
    _ <- Task(println(s"Shifted to: ${Thread.currentThread().getName}"))
    _ <- repeat
  } yield ()

repeat.runToFuture(s)

Asynchronous Boundary

// Output:
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-14
// Shifted to: scala-execution-context-global-14
// Shifted to: scala-execution-context-global-14
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-13
// Shifted to: scala-execution-context-global-13
// Shifted to: scala-execution-context-global-13
// Shifted to: scala-execution-context-global-12
// Shifted to: scala-execution-context-global-12
// ...
val s1: Scheduler = Scheduler(
  ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor()),
  ExecutionModel.SynchronousExecution)

def repeat(id: Int): Task[Unit] =
    Task(print(id)).flatMap(_ => repeat(id))

val prog = (repeat(1), repeat(2)).parTupled

prog.runToFuture(s1)

// Output:
// 1111111111111111111111111111111111111111111111111111111111...

Task Scheduling

val s1: Scheduler = Scheduler(
  ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor()),
  ExecutionModel.SynchronousExecution)

def repeat(id: Int): Task[Unit] =
  Task(print(id)).flatMap(_ => Task.shift >> repeat(id))

val prog = (repeat(1), repeat(2)).parTupled

prog.runToFuture(s1)

// Output:
// 121212121212121212121212121212121212121212121 ...

Task Scheduling

val s1: Scheduler = Scheduler( // 4 = number of cores on my laptop
  ExecutionContext.fromExecutor(Executors.newFixedThreadPool(4)),
  ExecutionModel.SynchronousExecution)
val s2: Scheduler = Scheduler(
  ExecutionContext.fromExecutor(Executors.newFixedThreadPool(1)),
  ExecutionModel.SynchronousExecution)

def repeat(id: Int): Task[Unit] =
  Task(print(id)).flatMap(_ => repeat(id))

val program = (repeat(1), repeat(2), repeat(3), repeat(4), repeat(5),
  repeat(6).executeOn(s2)).parTupled

program.runToFuture(s1)

// Output:
// 143622331613424316134424316134243161342431613424316134243 ...
// no '5' !

Task Scheduling

Light Async Boundary

  • Continues on the same thread by means of a trampoline
  • Checks with thread pool for cancelation status (in Monix & Cats-Effect IO)
  • Can help with stack safety
  • Low level!

Light Async Boundary

implicit val s = Scheduler.global
  .withExecutionModel(ExecutionModel.SynchronousExecution)

def task(i: Int): Task[Unit] =
  Task(println(s"$i: ${Thread.currentThread().getName}")) >> 
    Task.shift(TrampolineExecutionContext.immediate) >> task(i + 1)

val t =
  for {
    fiber <- task(0)
      .doOnCancel(Task(println("cancel")))
      .start
    _ <- fiber.cancel
  } yield ()

t.runToFuture 

// Output: 
// 0: scala-execution-context-global-11
// 1: scala-execution-context-global-11
// 2: scala-execution-context-global-11
// cancel

Light Async Boundary

val immediate: TrampolineExecutionContext =
  TrampolineExecutionContext(new ExecutionContext {
    def execute(r: Runnable): Unit = r.run()
    def reportFailure(e: Throwable): Unit = throw e
  })

Blocking Threads

  • Don't
  • Thread is being wasted, might prevent other tasks from being scheduled if the thread pool is limited
  • Use dedicated thread pool for blocking operations 
  • Use timeouts

Dealing with blocking ops

implicit val globalScheduler = Scheduler.global
val blockingScheduler = Scheduler.io()

val blockingOp = Task {
  Thread.sleep(1000)
  println(s"${Thread.currentThread().getName}: done blocking")
}

val cpuBound = Task {
  // keep cpu busy
  println(s"${Thread.currentThread().getName}: done calculating")
}

val t =
  for {
    _ <- cpuBound
    _ <- blockingOp.executeOn(blockingScheduler)
    _ <- cpuBound
  } yield ()

t.runSyncUnsafe()

// scala-execution-context-global-11: done calculating
// monix-io-12: done blocking
// scala-execution-context-global-11: done calculating

Semantic / Asynchronous Blocking

  • Task waits for certain signal before it can complete
  • Doesn't really block any threads, other tasks can execute in the meantime

Semantic / Asynchronous Blocking

val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
implicit val scheduler = Scheduler(ec)

val otherTask = Task.sleep(50.millis) >> Task(println("Running concurrently"))

val t =
  for {
    sem <- Semaphore[Task](0L)
    // start means it will run asynchronously "in the background"
    // and the next step in for comprehension will begin
    _ <- (Task.sleep(100.millis) >>
      Task(println("Releasing semaphore")) >> sem.release).start
    _ <- Task(println("Waiting for permit")) >> sem.acquire
    _ <- Task(println("Done!"))
  } yield ()

(t, otherTask).parTupled.runSyncUnsafe()

// Waiting for permit
// Running concurrently
// Releasing semaphore
// Done!

Fairness

  • Likelihood that different tasks are able to advance 
  • Without fairness you could get high disparity of latencies between processing different requests
  • Monix and ZIO introduce async boundaries from time to time (configurable)

Fairness

Green Threads and Fibers

Green Thread

Thread is scheduled by VM instead of OS. Cheap to start and we can map M Green Threads to N OS Threads.

Fiber

"Lightweight thread". Fibers voluntarily yield control to the scheduler.

Is Task like a Green Thread/Fiber?

  • We can map thousands of Tasks to very few OS threads
  • Async boundaries give a chance to other tasks from the thread pool to execute, very much like cooperative multitasking
  • "Blocking" a Task itself is OK since it doesn't block underlying Threads

Thank you !

https://gitter.im/typelevel/cats-effect

https://gitter.im/functional-streams-for-scala/fs2

https://gitter.im/monix/monix

https://gitter.im/scalaz/scalaz-zio

I will appreciate any feedback. :)

​If you're looking for help/discussion:

Made with Slides.com