Using an actor framework for scientific computing
opportunities and challenges
author: Krzysztof Borowski
supervisor: Ph. D. Bartosz Baliś
@liosedhel
Workflow
- Directed Graph
- Many inputs/Many outputs
- Nodes - activities
- Edges - dependencies (control flow)
- Each node activity == fun(Data): Result
Scientific workflow
- Data elements can be big
- Activities can be long-running and resource intensive
- Often invoke legacy code (e.g. Fortran, C) or external services
Scientific workflow - requirements
- Parallelization and distribution of computations
- Persistence and recovery
- Fault tolerance
Actor Model
- State isolation
- Async communication
- Behavior changing
- Spawning new actors
But why exactly the actor model?
But why exactly the actor model?
Similarities. Why not to give it a shot?
Actor Model difficulties
Aspect | Flow activity | Actors |
---|---|---|
Input data | Many typed input channels | One input mailbox |
Output data | Many typed output channels | Lack of output channels |
Flow patterns | Complicated patterns | Simple async messages in "fire and forget" manner |
Akka-streams to the rescue!
- Build with actor model
- Support for complicated flows (beautiful graph oriented API)
- Concurrent data processing
But...
Scientific workflows | Akka streams |
---|---|
bounded input data set | unbounded data stream |
big data elements | small data elements |
focused on scaling | focused on back-pressure (reactive streams implementation) |
important recovery mechanism | important high message throughput |
Current workflow engines
- Kepler, Teverna, Pegasus
- Focused on graphical interface for non programmers, or has complicated API
- Complex
- At the end of the day you always must write external program
Scaflow - New Hope
- Simple workflow engine for scientific computations
- For programmers
- Build with modern technologies in less then 1.5k lines of code (Scala, Akka, Cassandra)
Components
Source
Data processing
Data filtering
Data grouping
Components
Broadcast data
Merge data
Components
Synchronize data
Data sink
Scaflow API
StandardWorkflow.source(List(1, 2, 3, 4, 5, 6))
.map(a => a * a)
.group(3)
.map(_.sum)
.sink(println).run
Concurrent computations in actor model
Push model
Pull model
Scalability in actor model
Workflow persistent state
- Event sourcing
- Persistent actors
override def receiveRecover: Receive = receiveRecoverWithAck {
case n: NextVal[A] =>
if (filter(n.data)) deliver(destination, n)
}
override def receiveCommand: Receive = receiveCommandWithAck {
case n: NextVal[A] =>
persistAsync(n) { e =>
if (filter(e.data)) deliver(destination, n)
}
}
/* ... */
PersistentWorkflow.source("source", List(1, 2, 3, 4, 5, 6))
.map("square", a => a * a)
.group("group", 3)
.map("sum", _.sum)
.sink(println).run
Fault tolerance
val HTTPSupervisorStrategy = OneForOneStrategy(10, 5.seconds) {
case e: TimeoutException => Restart //retry
case _ => Stop //drop the message
}
PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy))
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
Real world example
Real world example cont.
val HTTPSupervisorStrategy = OneForOneStrategy(10, 10.seconds) {
case e: TimeoutException => Restart // try to perform operation again
case _ => Stop // drop the message
}
val savePathwayPngFlow = PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
val savePathwayTextFlow = PersistentWorkflow.connector[String]("textConnector")
.map("getTxt", getPathwayDetails, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkTxt", id => println(s"TXT details downloaded for pathway $id"))
PersistentWorkflow
.source("source", List("hsa"))
.map("getSetOfPathways", getSetOfPathways, Some(HTTPSupervisorStrategy))
.split[String]("split")
.broadcast("broadcast", savePathwayPngFlow, savePathwayTextFlow)
.run
Real world example - scaling
val remoteWorkersHostLocations =
Seq(AddressFromURIString("akka.tcp://workersActorSystem@localhost:5150"),
AddressFromURIString("akka.tcp://workersActorSystem@localhost:5151"))
val HTTPSupervisorStrategy = OneForOneStrategy(10, 10.seconds) {
case e: TimeoutException => Restart // try to perform operation again
case _ => Stop // drop the message
}
val savePathwayPngFlow = PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy),
workersNumber = 8, remoteAddresses = remoteWorkersHostLocations)
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
val savePathwayTextFlow = PersistentWorkflow.connector[String]("textConnector")
.map("getTxt", getPathwayDetails, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkTxt", id => println(s"TXT details downloaded for pathway $id"))
PersistentWorkflow
.source("source", List("hsa"))
.map("getSetOfPathways", getSetOfPathways, Some(HTTPSupervisorStrategy))
.split[String]("split")
.broadcast("broadcast", savePathwayPngFlow, savePathwayTextFlow)
.run
Event sourcing - performance
Future development
- Graph API
- Extensive usage of Akka clusters
- Workflow monitoring
https://github.com/liosedhel/scaflow
Special thanks go to
Ph.D. Bartosz Baliś
from AGH Universisty
for the idea, work supervision, advices and all other help during the research and Scaflow development
https://github.com/liosedhel/scaflow
Bibliography
- Akka - http://akka.io/
- Cassandra - http://cassandra.apache.org/
- Pegasus - pegasus.isi.edu
- Taverna - http://www.taverna.org.uk/
- Kepler - https://kepler-project.org/
- akka-streams - http://doc.akka.io/docs/akka-stream-and-http-experimental/2.0.3/scala.html
Scaflow-LambdaDays2016
By liosedhel
Scaflow-LambdaDays2016
- 2,721