Infrastructure Vision 2023

Where should we start from ?

With a little story

Let's plan a festival

What do we need ?

checklist ?

Requirements for a little festival

  • The fun part
    • Space
    • Acts
    • enough Food/Drinks
  • The not so fun
    • Bathrooms/Toilets
    • Some Trashcans

Requirements for a small festival

  • The fun part
    • more Space
    • more Acts
    • more and more diverse Food/Drinks
  • The not so fun
    • Backstage place
    • more Bathrooms/Toilets
    • Water and wastewater
    • Disposal system
    • Permissions

Requirements for

a bigger festival

  • The fun part
    • Space
    • more Acts
    • more and more diverse Food/Drinks
  • The not so fun
    • multiple Backstages with restrooms and special requirements
    • multiple Restroom areas
    • Water and wastewater
    • Disposal system
    • Permissions
    • Runways
    • Security
    • Paramedics
    • Escape ways
    • Regulatory frameworks

Requirements for

a large festival

  • The fun part
    • Space
    • more Acts
    • more and more diverse Food/Drinks
  • The not so fun
    • multiple Backstages
    • multiple Restroom areas
    • Water and wastewater
    • Disposal system
    • Permissions
    • Runways
    • Security
    • Paramedics
    • Escape ways
    • Regulatory frameworks
    • Traffic management
    • Backups
    • Monitoring
    • ...

How do we deal with the not so fun parts ?

With rulesets and logic

  • Running orders
  • "do and don't" lists
    • team
    • guests
  • diagrams and plans
  • enforce scalable units (tent size)
  • react based in monitoring

Infrastructure Vision FOREVER

Solve infrastructure problems with code

Infrastructure as Code (IaC)

SSH is dead

Immutability

It works in my computer

It actually works in your computer too

Git is our TRUTH

A look at what we have

Declarative IaC

terraform {
  version = "0.11.13"
}
provider "aws" {
  region = "eu-central-1"
}
resource "aws_s3_bucket" "your_new_bucket" {
  bucket = "my-first-website-cloud-native-website"
  acl    = "public-read"
website {
    index_document = "index.html"
  }
apiVersion: v1
kind: Pod
metadata:
  name: nicepod
  labels:
    App: dev
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP

*Imperative is a possibility too

GitOps with Terraform Cloud

GitOps with ArgoCD

Special thanks to the ugly avatar bot for deploying all the stuff for us.

A hero without cape

parcelLab/deployment.git

Collaboration with Infrastructure as Code is efficient

parcelLab/infrastructure.git

EC2

Our AWS datacenter running parcelLab's EC2 instances - true story

EKS

aka "Kubernetes where AWS does all the nasty stuff"

  • Lightweight
  • Secure
  • Open source
  • Optimized for EC2

Supported and managed by AWS

Karpenter

Just-in-time Nodes

  • Open source
  • Improve app availability
  • Lower compute costs
  • Minimum operational overhead

Node 1

Node 2

Node 3

Node 3

Node 4

EC2 + Karpenter + Bottlerocket + EKS

Our AWS datacenter now - not exaggerated

parcelBazaar

Deployment as a Service

  • plconfig v1 (easy deployments)
  • Scaling capabilities (manual or automatic)
  • Monitoring into Datadog
  • Fine-grained permissions per team (jail)
  • Secrets management via environment variables
  • HTTPS ingress with certificate automatically created

Teams only take care of the app configuration and its container/s

Monitoring as a Service

  • Datadog enablement
  • Dashboards for customers, etc...
  • Getting the 360° view on our system

Authentication as a Service

  • Keycloak and Auth0 in evaluation
    • Securing applications by configuration
    • externalising administrative overhead for authorisation

The actual 2023 vision

Containerized services in Kubernetes

  • Move all legacy workloads (EC2, ECS, lambdas...)
  • Reduce infrastructure costs
    • Spot instances
    • Karpenter auto scaling
    • Bottlerocket
    • Monitoring to tweak resource requests and limits
  • Give more options for automatic and manual scaling of workloads
  • Faster scaling and adaptation to unexpected loads

Monitoring enablement

  • Teams with monitoring ownership
  • get a lot of stuff built in

Single Sign On everywhere

  • Use the internal Microsoft account to sign into any internal tool we have
  • Put Azure into Terraform so onboarding/offboarding in Microsoft is also automated
  • Define group scheme and permissions in the same way we have it with Github/AWS

Ownership of common infrastructure

  • Only the "base" part (common to multiple teams)
    • MongoDB
    • SES
    • SQS...
  • Goal is to enable other teams to create their own resources there (and monitor them)

 Vanity domains whitelabeling

  • Leverage Let'sEncrypt (already in place) to save $
  • Cloudflare has some functions that need to be migrated with care
  • Customer migration plan
  • Self-service enablement for our colleagues

Stay tuned: January 12th 2023

Infrastructure vision v2

By andibeuge

Infrastructure vision v2

  • 322