DDD/CQRS/ES with Akka - Lessons learneD


 


Andrzej Dębski,  Bartłomiej Szczepanik

AGENDA

  1. Quick intro to crucial concepts (if needed)
  2. Quick recap of our work
  3. Main challenge
  4. Thoughts, lessons learned
  5. Future work




Cqrs/es

 

domain-driven design


  • Set of strategic and tactical patterns in software engineering
  • More and more popular
  • Most important strategic patterns:
    • Bounded context & context map
    • Ubiquitous language
    • Polyglot persistence
  • Most important tactical patterns:
    • All business logic resides in the "domain model" 
    • Commands & events
    • Aggregates
    • Long running process

Command-query responsibility segregation (CQRS)



CQRS advantages

  • Different databases on read and write side
  • Separation of different query use cases
  • Database tailored to the use case
  • Lower latency of queries


CQRS DISADVANTAGES

  • Eventual consistency (?)
  • Possible code duplication
  • More components to maintain
  • Uniqueness constraint problem

event sourcing


ES advantages

  • Full event log for free
  • Append-only is enough
  • Enables to add more CQRS read models in the future
  • No object-relational impedance mismatch
  • Event storming model maps 1:1 with ES

ES DISADVANTAGES

  • Requires fine-grained model in order to be performant
  • Performance issues after some time (snapshots help)
  • Upcasting needed when the event format changes




AKKA

 

  • A toolkit,  not a framework
  • Distributed by design
  • Scala and Java API
  • Message passing style
  • Actor concurrency model
  • Supervision hierarchy (let it crash)
  • Location transparency

Actor model

  • Active object flavour
  • ActorRef , location transparency
  • Message passing
  • Mailbox - message queue
  • Unit of concurrency
  • Hundred thousands of instances
 class MyActor(magicNumber: Int) extends Actor {
  def receive = {
    case x: Int => sender() ! (x + magicNumber)
  }
}
val system = ActorSystem("mySystem")
val myActor = system.actorOf(Props[MyActor], "myactor2")

myActor ! 97
val futureResponse = (myActor ? 2).mapTo[Int]

akka routers

  • Pools and groups
  • Round robin routing
  • Consistent hashing routing
  • Broadcast routing
  • Balancing routing

akka.actor.deployment {
  /parent/router3 {
    router = round-robin-group
    routees.paths = ["/user/workers/w1", "/user/workers/w2", "/user/workers/w3"]
  }
}

val router3: ActorRef = context.actorOf(FromConfig.props(), "router3")
router3 ! Work()

akka modules

  • Akka streams (Rx)
  • Akka HTTP
  • Akka clustering
  • Distributed Publish Subscribe in Cluster
  • Akka cluster sharding
  • Akka persistence

akka persistence

  • Persistence mechanism for actors
  • Based on Command/Event Sourcing concept
  • PersistentActor
    • defines "persistenceId"
    • persists messages/events bound to the id
    • when created, all correlated messages are replayed
  • Variety of journal plugins
  • Views
  • Snapshots
  • No support for CQRS and upcasting

AKKA Persistence

class ExamplePersistentActor extends PersistentActor {
  override def persistenceId = "sample-id-1"
 
  var state = ExampleState()
  def updateState(event: Evt): Unit = { state = state.updated(event) }
 
  def numEvents = state.size


              val receiveRecover: Receive = {
    case evt: Evt                                 => updateState(evt)
    case SnapshotOffer(_, snapshot: ExampleState) => state = snapshot
  }
        
  val receiveCommand: Receive = {
    case Cmd(data) =>
      persist(Evt(s"${data}-${numEvents}"))(updateState)
      persist(Evt(s"${data}-${numEvents + 1}")) { event =>
        updateState(event)
        context.system.eventStream.publish(event)
      }
    case "snap"  => saveSnapshot(state)
    case "print" => println(state)
  } 
}

akka clustering

  • P2P (gossip) clustering protocol
  • Membership service
  • Automatic failure detection
  • Cluster-aware routers
  • JMX metrics
 class SimpleClusterListener extends Actor with ActorLogging {
  val cluster = Cluster(context.system)
 
  override def preStart(): Unit = cluster.subscribe(
      self, initialStateMode = InitialStateAsEvents, classOf[MemberEvent])
  override def postStop(): Unit = cluster.unsubscribe(self)
 
  def receive = {
    case MemberUp(member) => log.info("Member is Up: {}", member.address)
    case MemberRemoved(member, previousStatus) =>
      log.info("Member is Removed: {} after {}", member.address, previousStatus)
    case _: MemberEvent => // ignore
  }
}

akka cluster sharding

  • Shards of stateful actors
  • ShardRegion and ShardCoordinator services
  • Rebalancing shards using akka-persistence
  • Passivation of actors
ClusterSharding(system).start(
  typeName = "Counter",
  entryProps = Some(Props[Counter]),
  idExtractor = idExtractor,
  shardResolver = shardResolver)

val idExtractor: ShardRegion.IdExtractor = {
  case EntryEnvelope(id, payload) ⇒ (id.toString, payload)
  case msg @ Get(id)              ⇒ (id.toString, msg)
}
 
val shardResolver: ShardRegion.ShardResolver = msg ⇒ msg match {
  case EntryEnvelope(id, _) ⇒ (id % 10).toString
  case Get(id)              ⇒ (id % 10).toString
}




implementation

 

UBIQUITOUS LANGUAGE

  • An airplane is assigned to a rotation. 
  • A rotation consists of legs.
  • A leg is a directed relocation of an airplane
    between two airports at given date.

  • A flight consist of legs and has a flight designator
  • Each airport defines a standard ground time which is the minimum time that an airplane have to spend on ground between consecutive legs
  • One can check if all legs in a rotation hold continuity property, does not violate standard ground times and flight numbers are not duplicated.
  • Schedule can be imported from SSIM file (industry standard)

CONTINUITY CHECK EXAMPLE


technology stack

APPLICATION ARCHITEcture


WRITE model design

  • Airplane and Rotation aggregates
  • (Persistent) Actor = Aggregate
  • Scala case classes = Value Objects, Events, Commands
  • Publishing domain events, e.g. RotationAdded
  • Separated domain from infrastructural concerns
  • Hexagonal architecture
  • Rest API

read model design

  • Graph-oriented database (Neo4j)
  • Denormalization of events from event bus
  • REST API

READ MODEL Scalability

  • Simple replication of read model instances
  • Round robin load balancing
  • Took advantage of the replayability

WRITE MODEL SCALABILITY

  • Aggregate roots sharding and rebalancing
  • Round robin routers as load balancers
  • Scalable event store - Cassandra





replayable event bus

main challenge


distributed EVent bus

  • Messsage delivery is not an issue
    • DistributedPubSub Akka extension
    • ZeroMQ
    • RabbitMQ
  • We need to replay events from the past

event bus #1

Akka Persistence Views

  • Views can replay only events for a single persistent actor
  • Views are polling the event store
  • This will change in 2015 Q3 (see Akka Roadmap) [21]

EVENT BUS #2

Apache Kafka as an event store

Nearly perfect solution! But...

Kafka was not designed with this use case in mind: [51]
  • Retention time  usually 1-14 days.
  • Maximum number of partitions way too low
  • Designed mainly for log processing


EVENT BUS #3

Kafka + Cassandra tandem

  1. Replay past events from Cassandra
  2. Subscribe to Kafka

We can miss events! 



EVENT BUS #3

Kafka + Cassandra tandem

  • We leveraged Kafka durability
  • Kafka retention time set to X (e.g. 24h)
  • Subscribing from scratch in Kafka after Cassandra replay
  • Filtering duplicate events
  • If replay takes less than X we won't miss any event

EVENT BUS #4

Cassandra subscribed to Kafka

  • We could still lose an event
  • Cassandra eventually gets all events
  • During replay of write model aggregates
     we replay both from Cassandra and Kafka

STILL a room for improvement

  • Events ordering between aggregates [58]
  • Single Kafka topic/partition for now
  • Subscription only to all events
    • often we need to listen for a specific events only
      (e.g. from a single aggregate)
    • database is better in data filtering
    • unnecessary traffic
  • Not optimal for simultaneous replays on different nodes
  • ATOM interface
  • Spark connectors  




lessons learned

and those still not learned...

 

distributed ddd

  • Pat Helland's entities [45] matches aggregate definition
  • CQRS makes scalability and distribution easier
  • Saga/Business process is hard to implement efficiently
  • Application services need to be replicated
  • A single aggregate may be still a bottleneck
    in terms of latency or availability:
    • Possible solution: CRDTs [20, 46, 47]
  • Guaranteed delivery is tricky
    • idempotency for the rescue
    • idempotent read models are not trivial
    • transactional reads from a queue
  • DDD and microservices

ddd implementation concerns

  • Actors in the domain code? NO! [27]
  • Prefer TypedIdClasses over UUID
  • What to do when command validation fails?
    • Return error? 
    • Throw exception? 
    • Publish event?

akka stuff

  • Avoid ask pattern if possible [22, 24] 
    • timeout hell
    • performance issues
    • tell, don't ask
  • It's not easy to manage dependencies
    • We don't like cake pattern!
  • How we should handle stateless business logic?
    • Actors behind routers?, futures?, static classes?
      continuation monad? dataflow?  [23, 25, 26]
  • Cluster sharding is not fully dynamic yet
    • Cannot change number of shards
      without restarting the app

testing

  • Akka Multi-JVM/Multi-Node testing toolkit
  • Test Data Builder Pattern rocks! [59]
  • Start with integration tests on application service level
  • Given/When/Then perfectly fits to DDD: [30]
    • given past events
    • when command fired
    • then expected event(s)

TESTING #2

  • Eventual consistency forces you to wait in tests :(
    • Sometimes it is possible to avoid it
    • e.g.: waiting on expected number of entities
  • Testing timeouts is painful
  • Where to put e2e tests?
    • Completely outside of the app?
    • In the REST port?

other lessons

  • EJB and Akka are in fact similar! [19-20]
    • Active Object
    • #unpopularopinion 
  • Eventual consistency 
    • often is feasible and realistic
    • introduces new problems
  • Distribution complicates things a lot, even with a good design
  • Performance measurement is not straightforward
  • App monitoring is challenging
  • Automatized deployment is crucial
  • No non-vendor specific autoscaling solution available

FUTURE work


  • Open source - on the way!
  • More detailed performance evaluation
  • Replayable event bus improvements
    • ATOM, Spark Streaming, Redis
  • Causal consistency [58]
  • Effective sagas implementation
  • Effective stateless logic implementation
  • Upcasting and snapshoting [33]
  • Geo-based sharding

Spin-offs

  • Maciek & Adam
    • Integration with PaaSage platform
    • Metrics exposure
    • Second read model
    • More detailed performance evaluation
  • Mariusz & Michał [55]
    • Akka debugging tool
    • Akka-tracing enhancement [54]

DDD/CQRS/ES with Akka - LESSONS LEARNED








Andrzej Dębski
andrzejdebski91 @ gmail.com

Bartłomiej Szczepanik
mequrel @ gmail.com, @bszczepanik

References


http://goo.gl/EFFU9N

CQRS/ES/DDD with Akka - Lessons learned

By mequrel

CQRS/ES/DDD with Akka - Lessons learned

  • 3,698