The Data Dichotomy

(Micro)services

Services represents bound contexts

  • Orders
  • Customers
  • Catalog

Each context has its own data

Webscale!

Services and datasets can be deployed and scaled independently

But..

Companies are a collection of

services which must work together

How to share data?

Each context usually depends

one or more external dataset.


F.ex. Orders need Customer data.

Lots of data services

Distributed Join Problem *

* Hard even for databases

API starts leaking

More coupling

  • getOrder(Id) 
  • getOrder(UserId) 
  • getAllOpenOrders() 
  • getAllOrdersUnfulfilled(Id) 
  • getAllOrders() 
  • getOpenOrders(fulfilled=false, deliveryLocation=CA, orderValue=100, operator=GreatherThan)

Duplication

Divergence over time

Other issues

Buffering, handling failure, backpressure, scaling, etc. get pushed into service’s responsibilities

The synchronous world of Request/Response protocols leads to tight, point-to-point couplings.

Which leads us to Data Dichotomy

Dichotomy?

Contradiction

Services

Encapsulation encourages us to hide data as it

decouple services. But we also want to slice and dice shared data.

Data Systems

Data systems do not encapsulate,

they expose as much as possible,

nor do they manage complexity very well.

Data Dichotomy

Data systems are about exposing data.

Services are about hiding it.

- Ben Stopford

How can we address this?

Let's look at

the options

Service Interfaces

Synchronous Request-Response = high coupling

Messaging

No history = divergence

Shared Database

Concentration

Single Point of Failure

Coordination

We clearly have a problem!

Is it data?

We need encapsulation so we don’t expose a service’s internal state.

 

But we need to make it easy for services to get access to shared data so they can get on and do their jobs.

Make data-on-the-outside a first class citizen

Event Driven

Centralize a stream of Events

Share data as immutable Events

Decentralize Event Processing in each Service

Services encapsulate their own processing

Derived data from processing is isolated within each service

Derived data is only cached, not permanent

Derived data can always be reconstructed

Derived data can always be changed

Derived data change with the service*

* Even the version

Derived data can be any dataset, joined, grouped and aggregated in any way

Centralize an immutable stream of facts.

Decentralize the freedom to act, adapt and change.

- Ben Stopford

The Data Dichotomy

By André Roaldseth

The Data Dichotomy

  • 517