Strangling The Monolith

Applied Patterns & Practices From The Trenches

Thomas Ploch - Principal Software Engineer @ Flix

Quick Outlook

A small history tour

A small history tour

2015 merger with competitor Flixbus to form the largest long-distance bus travel provider in the German market

Mid-2017 our modernisation journey started

Development began 2012 with a single development team with support from one near-shoring team in the Ukraine

  • PHP 5.2
  • MySQL 5.5
  • jQuery 1.8

Tech Stack

Architecture

Image: Cover "No Silver Bullet—Essence and Accident in Software Engineering". IEEE Computer (April 1987).

Architecture

Image: Cover "No Silver Bullet—Essence and Accident in Software Engineering". IEEE Computer (April 1987).

Somehow we, as an industry, still consistently end up, repeatedly, with many silver bullets.

And it was not different with us!

JEOPARDY TIME!

JEOPARDY TIME!

This architecture pattern was the most-widely used software & framework architecture for web-based projects in 2012.

JEOPARDY TIME!

What is ...?

MVC

Model-View-Controller Architecture

  • It's fast to get started with MVC, the market liberalization was a hard deadline so delivering fast was paramount.
     
  • All layers should be changeable independently from each other, so scaling development should not be a problem.
     
  • Ability to provide multiple views, i.e. HTML & JSON for the API.
     
  • Routing, Security, Templating - batteries included.

Model-View-Controller Architecture

Houston, we have a problem!

In our case the cycle time kept increasing!

At some point we couldn't even implement certain features - "that's not possible..."

Lead & Cycle Times

Failure Patterns

1. Anemic Domain Models

1. Anemic Domain Models

class OrderEntity {
    private DateTime $createdAt;
    private string $status;
    private MutableCollection $items;
    
    public function getStatus(): string
    {
        return $this->status;
    }
    
    public function setStatus(string $status): void
    {
        $this->status = $status;
    }

    public function getItems(): MutableCollection
    {
        return $this->items;
    }
}

1. Anemic Domain Models

// Business logic is spread out to many services
class OrderService {
   public function cancelOrder(int $id): void
   {
       $order = $this->fetchOrderFromDB($id);
       $status = $order->getStatus();
       // some validation & processing logic
       $order->setStatus('cancelled');
       $this->saveOrderToDB($order);
   }
}

1. Anemic Domain Models

Mmmmm, Lasagna!

1. Anemic Domain Models

When the classes that describe the model and the classes that perform operations on the model are separate. The services contain all the domain logic while the the domain objects themselves contain practically none.

The classic Lasagna architecture

1. Anemic Domain Models

Blood, Sweat & Tears

Time

Anemic Model

Rich Model

The really sweet spot

Time to complete

The sweet spot

Time to complete

Time to complete

The horrible spot

1. Anemic Domain Models

2. Weak Boundaries

Use Case A

Use Case B

Use Case C

Use Case D

Order

Service

Initially you start with a single use case and everything seems perfectly fine

Now the services are starting to be shared between differing use cases

More and more use cases make use of the shared service because DRY, right?

And over the time this initially nicely fitting Order Service has become an unmaintainable mess suffering from the God Class symptom

Order

Entity

Anemic Model

Order

Service

2. Weak Boundaries

Order Repository

Order Item Repository

Use Case A

Use Case B

Keeping consistency is IMPOSSIBLE!

Use Case A works on Orders and let's the Order model handle the internal consistency.

Use Case B breaks the consistency boundary by operating on the collection of Order Items directly.

2. Weak Boundaries

Order

Item

Order

Item

Order

Developers will very often take the path that was paved before they joined

3. That's How we do things here...

3. That's how we do things here...

Developer is fed up with the messy system and leaves the company

New developer is hired

New developer follows the paved path within the existing system

Problems get worse and new developer struggles with delivery

Developer wants to change things but stakeholders do not see the necessity

Service B

Service C

Service D

Service E

Service A

Use Case A

3. That's how we do things here...

State

State

State

State

State

  • Lasagna architecture made it very complicated to add and change features

  • State management was spread across many services and a major source for bugs and incidents

  • Developers were very unhappy and the re-hire cycle even accelerated the problems

  • All problems reinforced themselves and spun out of control very quickly

Failure Patterns

What can we do?

New path

The 5-step Improvement process

The 5-step Improvement Process

Identify the most valuable components and their boundaries

1.

Align the organisation around the future boundary cut-marks

2.

Measure & analyze the current system's migration complexity

3.

Pick the next most valuable component and start the migration

4.

Finish the migration and go back to step 4

5.

1. Identify the Most Valuable Components

Identify the most valuable components and their boundaries

1.

Align the organisation around the future boundary cut-marks

2.

Measure & analyze the current system's migration complexity

3.

Pick the next most valuable component and start the migration

4.

Finish the migration and go back to step 4

5.

1. Identify the most valuable components

The Map places the Value Chain components on an economic evolution horizon. The more commoditized a component is, the less there is a need to build & maintain components yourself. 

Purpose & Scope set the stage for the mapping exercise

Anchoring the Value Chain on the users' needs is important to keep the focus on the customer value.

The Value Chain identifies
dependencies in the value delivery. This helps with understanding your actual core capabilities.

1. Identify the most valuable components

1. Identify the most valuable components

1. Identify the most valuable components

You want everything in core to take priority because you believe that this will give you an edge over your competitors.

Becoming faster in delivering value has the highest ROI in the core quadrant.

1. Identify the most valuable components

2. Re-Align the Organisation

Identify the most valuable components and their boundaries

1.

Align the organisation around the future boundary cut-marks

2.

Measure & analyze the current system's migration complexity

3.

Pick the next most valuable component and start the migration

4.

Finish the migration and go back to step 4

5.

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

...if you have four groups working on a compiler, you’ll get a four-pass compiler.

Melvin Conway

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

2. RE-ALIGN THE ORGANISATION

.git CODEOWNERS

Team B

Team A

Module A

Module B

3. Measure & Analyze

Identify the most valuable components and their boundaries

1.

Align the organisation around the future boundary cut-marks

2.

Measure & analyze the current system's migration complexity

3.

Pick the next most valuable component and start the migration

4.

Finish the migration and go back to step 4

5.

3. Measure & Analyze

3. Measure & Analyze

3. Measure & Analyze

Static

  • Average cylcomatic complexity
  • Afferent/efferent coupling & Instability
  • No. of module dependencies
  • Logical lines of code (size)

Behavioral

  • VCS change coupling
    • Which things are changing together?
  • VCS change rate per module & team
    • Which things are changed by a team?
  • VCS change rate per module & team

Risk

  • Is revenue generation effected?
  • Scale of operations?
  • High throughput / low latency?

3. Measure & Analyze

A

B

C

D

This will probably never be migrated, and may even be a trigger to give up on this component completely!

4. Let's get Started!

Identify the most valuable components and their boundaries

1.

Align the organisation around the future boundary cut-marks

2.

Measure & analyze the current system's migration complexity

3.

Pick the next most valuable component and start the migration

4.

Finish the migration and go back to step 4

5.

4. Let's get Started

4. Let's get started

4. Let's get Started

4. Let's get started

SELECT * FROM orders where...
INSERT INTO orders VALUES...
UPDATE orders SET ... WHERE

Use Case C

Use Case D

Use Case E

Use Case F

Use Case B

Use Case A

Traces connect the database and use cases through the code

Tip: Start with the INSERT use cases first, since they represent the genesis cases.

Most tracing tools/agents support the major database clients and connect queries with code traces and/or logs.

Span

Span

Span

Span

Span

Span

OpenTelemetry / observability platform

4. Let's get Started

class Order {
    private DateTime $date;
    private string $status;
    // Impossible to construct an invalid state
    public function __construct() {
        $this->date = new DateTime(
            'now',
            new DateTimeZone('UTC'),
        );
        $this->status = 'new';
    }
}

4. Let's get Started

class Order {
    // ...
    
    public function cancel(): void {
        // now the validation is within the model
        if ($this->status === 'cancelled') {
            throw new LogicException(
                'Invalid status cancelled'
            );
        }
        $this->status = 'cancelled';
    }
}

4. Let's get Started

class OrderService {
    private OrderRepository $repository;

    public function cancel(int $id): void
    {
        $order = $this->repository->get($id);
        $order->cancel();
        $this->repository->save($order);
    }
}

4. Let's get Started

4. Let's get Started

4. Let's get started

Use Case A

// The Strangler Facade is just an abstraction
// It will have multiple implementations
interface OrderFacade {

    public function place(
        Id $id,
        Money $value,
        CustomerId $customerId,
    ): OrderCreatedEvent;

    public function cancel(Id $id): OrderCancelledEvent;
}
Strangler Facade

Transactional Outbox

TX boundary

Order

Event

New
O
rder
Service

Outbox
Consumer

This has to be repeated for all the identified use cases

Once we have covered all the use cases with the Strangler Facade the new service is completely in sync with the legacy system!

4. Let's get started

Strangler Facade

Remote Client

Order

Event

New
O
rder
Service

Legacy Sync Consumer

All the changes behind the Strangler Facade are completely transparent to the clients!

Keeping the legacy system up to date for some time will be necessary due to reporting and other use cases!

Finish the migration and go back to step 4

5.

  • Start with creating a new target picture based on strategic and economic factors. For us Wardley Maps, Strategic DDD and Team Topologies worked.

  • Realign the organisation around the new system architecture. Inverse-Conway Maneuvers worked for us, but it's not for everyone since it is a huge social perturbation.

  • Measure & analyze the code, the behaviors and the risks before diving into implementation. You'll often find surprises!

  • Design your new APIs using collaborative modelling and modern approaches. In the end, that's what modernization is all about.

  • Leverage observability platforms and other tooling to help you find the entry points to the use cases you want to tackle - by analyzing the traces from the DB to the code entry-points.

  • Tackle each of the entry-points with the newly designed Strangler Facade. We used the Transactional Outbox pattern and asynchronous event integration successfully.

  • Continue while it's valuable for the organisation, but constantly inspect and adapt.

Success Patterns

Thank You!

We're Hiring!

Slides online

Strangling the Monolith - Applied Patterns & Practices From The Trenches @ NDC Oslo 2022

By Thomas Ploch

Strangling the Monolith - Applied Patterns & Practices From The Trenches @ NDC Oslo 2022

Many developers have been there. A beautifully crafted system that has helped the company grow slowly deteriorates into a big ball of mud. The calls for a rewrite are becoming louder and louder, but these big-bang rewrites historically have a high risk of failure. Is there a way to iteratively step out of the mud? We at Flixbus faced a similar situation and were looking for answers - and we found one! The strangler (fig) pattern which allowed us to incrementally evolve even our high-risk core services towards a modern reactive system architecture. This sounds easy, but our path was paved with hard learning and many failures. In this session I will present the patterns and practices that helped us on our journey & the many pitfalls we have found.

  • 422