opportunities and challenges
author: Krzysztof Borowski
supervisor: Ph. D. Bartosz Baliś
@liosedhel
Aspect | Flow activity | Actors |
---|---|---|
Input data | Many typed input channels | One input mailbox |
Output data | Many typed output channels | Lack of output channels |
Flow patterns | Complicated patterns | Simple async messages in "fire and forget" manner |
Scientific workflows | Akka streams |
---|---|
bounded input data set | unbounded data stream |
big data elements | small data elements |
focused on scaling | focused on back-pressure (reactive streams implementation) |
important recovery mechanism | important high message throughput |
Source
Data processing
Data filtering
Data grouping
Broadcast data
Merge data
Synchronize data
Data sink
StandardWorkflow.source(List(1, 2, 3, 4, 5, 6))
.map(a => a * a)
.group(3)
.map(_.sum)
.sink(println).run
Push model
Pull model
override def receiveRecover: Receive = receiveRecoverWithAck {
case n: NextVal[A] =>
if (filter(n.data)) deliver(destination, n)
}
override def receiveCommand: Receive = receiveCommandWithAck {
case n: NextVal[A] =>
persistAsync(n) { e =>
if (filter(e.data)) deliver(destination, n)
}
}
/* ... */
PersistentWorkflow.source("source", List(1, 2, 3, 4, 5, 6))
.map("square", a => a * a)
.group("group", 3)
.map("sum", _.sum)
.sink(println).run
val HTTPSupervisorStrategy = OneForOneStrategy(10, 5.seconds) {
case e: TimeoutException => Restart //retry
case _ => Stop //drop the message
}
PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy))
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
val HTTPSupervisorStrategy = OneForOneStrategy(10, 10.seconds) {
case e: TimeoutException => Restart // try to perform operation again
case _ => Stop // drop the message
}
val savePathwayPngFlow = PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
val savePathwayTextFlow = PersistentWorkflow.connector[String]("textConnector")
.map("getTxt", getPathwayDetails, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkTxt", id => println(s"TXT details downloaded for pathway $id"))
PersistentWorkflow
.source("source", List("hsa"))
.map("getSetOfPathways", getSetOfPathways, Some(HTTPSupervisorStrategy))
.split[String]("split")
.broadcast("broadcast", savePathwayPngFlow, savePathwayTextFlow)
.run
val remoteWorkersHostLocations =
Seq(AddressFromURIString("akka.tcp://workersActorSystem@localhost:5150"),
AddressFromURIString("akka.tcp://workersActorSystem@localhost:5151"))
val HTTPSupervisorStrategy = OneForOneStrategy(10, 10.seconds) {
case e: TimeoutException => Restart // try to perform operation again
case _ => Stop // drop the message
}
val savePathwayPngFlow = PersistentWorkflow.connector[String]("pngConnector")
.map("getPng", getPathwayMapPng, Some(HTTPSupervisorStrategy),
workersNumber = 8, remoteAddresses = remoteWorkersHostLocations)
.sink("sinkPng", id => println(s"PNG map downloaded for $id"))
val savePathwayTextFlow = PersistentWorkflow.connector[String]("textConnector")
.map("getTxt", getPathwayDetails, Some(HTTPSupervisorStrategy), workersNumber = 8)
.sink("sinkTxt", id => println(s"TXT details downloaded for pathway $id"))
PersistentWorkflow
.source("source", List("hsa"))
.map("getSetOfPathways", getSetOfPathways, Some(HTTPSupervisorStrategy))
.split[String]("split")
.broadcast("broadcast", savePathwayPngFlow, savePathwayTextFlow)
.run
https://github.com/liosedhel/scaflow
https://github.com/liosedhel/scaflow