Pioneering Serverless Payments at The LEGO Group

Luke Hedger

The LEGO Group

Serverless Architecture Conf - Oct 2022

Pioneering Serverless

Payments

September 2018

LEGO.com

Payments platform

Brickbank

20+ Microservices
80+ Lambda functions
100+ GB in DynamoDB
10+ Step Functions workflows
40+ EventBridge rules

Contain complexity

Image from cockroachlabs.com

async

processes

365+ days

shoppers

Scale, scale, scale

Image from leanovia.com

May 4th

What happens after the serverless launch, after the pioneering?

How does mature serverless app stay modern?

Post-Pioneer

Growing up

  • Different challenges today vs start
  • Stability vs speed
  • Serverless has changed
  • Architecture and operations may not scale

The tradeoff

DELIVERY SPEED

APP STABILITY

Decisions

Serverless Things Start → Scale
Delivery pipelines Start with one, scale to many. Move focus from implementation to delivery
Separate concerns Start with few, scale to many. Collect domain knowledge, avoid wrong abstraction
3rd party tools Didn't exist (EventBridge) or unproven (CDK). Evolvable architecture
Emerging standards Serverless matures, community grows, discussion amplifies
Test strategy Stability > speed → stability == speed
Observability Unknowns decrease over time

Finding balance

DELIVERY SPEED

APP STABILITY

Serverless stability is often compromised by complex events

Application

Stability

Baskets

Orders

Domains

Payments

Shopper checkout

Order submission

Payment requests

Asynchronous authorisations

Order fulfilment

Baskets

Orders

Current: pull

Payments

Is payment authorised?

Order submitted

Payment authorised

1

2

3

Baskets

Orders

Future: push

Payments

Order submitted

Payment authorised

1

1

1

2

Cross-domain events flow

Cross-domain events

  • Consumable - simple onboarding, familiar delivery
  • Discoverable - well-documented
  • Standard - standards-compliant, schema-based, interoperable
  • Versioned - predictable payloads, backwards-compatible
  • Secure - encrypted, verifiable messages
  • Scalable - handle high volume (100s events p/second)
  • Resilient - fault tolerant, replayable, high-availability

Standard events

Amazon EventBridge

  • A serverless event bus with powerful rule-based filtering
  • Already used in domain - no new tech, subscribe to existing events
  • Target AWS workloads (cross-account events) and non-AWS consumers (HTTP API Destinations)

Event hierarchy

  • Topical hierarchy of event types allows for granular filtering upon consumption
  • {Entity}.{Event}.{Metadata}
  • Metadata could be locale, platform etc
Payment.Authorised.Ecom
Payment.Authorised.Pos
Order.Created.Uk
Basket.Paid.Eu

Event evolution

  • Handle changes to events via additions whenever possible
  • Introduce new event types if breaking changes are required e.g. Payment.Authorised, Payment.AuthorisedV2, Payment.Completed
  • Produce both the old event and the new event for some time/forever to minimise disruption
  • Increment AsyncAPI version accordingly - new events = major, updated events = minor, fixes = patch

Event enrichment

  • JSON Patch Standards-based event payload enrichment
  • Reduces custom logic and risk of regression
  • JSON Patch operations: add, remove, replace, copy, move

Secure events

  • Nested JSON Web Tokens (JWTs)
  • Signed - "is this event from trusted source?"
  • Encrypted - sensitive data is encrypted in transit
  • Encoded - compressed payloads

Event validation

  • Events can be described as JSON Schema documents
  • Validate event payloads against event schema in unit tests (e.g. with AJV)
  • Producer (and consumers) could also validate at runtime

Contract testing

  • Event contract between producer and consumers
  • Broker-less contract testing - replace traditional "broker" with a central registry of schema/contracts e.g. GitHub Packages
  • Schemas need to be versioned and distributed to consumers
  • Test workflow: Assert HTTP response expect.toMatchSchema

Delivery Speed

We need to observe production behaviour ASAP when using managed services

Continuous Serverless Delivery

Serverless SDLC

  • Solution design - architecture, service limits
  • Targeted refactoring, strangler pattern
  • Optimise for test confidence, not coverage
  • Isolated service deployments
  • Use alarms!

Design → Develop → Test → Release → Observe

Speed up tests by reducing coverage

Serverless Square of Balance

Designing a test strategy allows us to ship at a sustainable rate as our codebase grows

Serverless Test Strategy

❌ Test everything all the time

❌ Long-running tests

❌ Too-good tests

❌ Ignore failures/hit retry and 🤞

Identify critical paths ✅

Test against the cloud ✅

Test integration points/event sources ✅

Blend with observability ✅

Recap

Pioneering serverless payments

  • Optimise for application stability but not at the expense of delivery
  • Resilient events are key to maintaining serverless stability
  • Use AsyncAPI and CloudEvents to standardise events
  • Use Amazon EventBridge to choreograph events at scale and JSON Patch for event enrichment
  • Devise comprehensive test strategy with variety of tests
  • Balance testing with rapid delivery pipelines, fault tolerance and observability of application health

Thanks!

Pioneering Serverless Payments at The LEGO Group

By Luke

Pioneering Serverless Payments at The LEGO Group

The LEGO Group was a super early adopter of the serverless mindset and associated technology. It is time to review how one of its core products continues to operate and innovate 3 years on. We will take a look at the serverless, event-driven architecture powering The LEGO Group's payments platform, Brickbank, and how it facilitates e-commerce on a global scale. We'll explore the full software delivery lifecycle of the payments platform, including development, testing, release and observability. We will look at some of the key AWS services that play crucial roles in the platform's architecture, including Lambda, Step Functions, DynamoDB, EventBridge, CDK and Macie. We will also assess how a mature serverless application remains up-to-date with constantly evolving best practices, industry trends and feature requests.

  • 200