Global Tenant Management using Argo

follow live

Almost 30 years in the sector

Mostly as Software Engineer

Web - 3D - Middleware - Mobile - Big Data

 

More recent as Architect

Data - SRE - Infrastructure

 

Community

Apache Beam contributor

OpenTelemetry Collector contributor

 

Collibra

Principal Systems Architect

Alex
Van Boxel

ArgoCon 2024
Salt Lake City

ArgoCon 2024
Salt Lake City

ArgoCon 2024
Salt Lake City

tenant
environments

operations
teams

ArgoCon 2024
Salt Lake City

tenant
environments

operations
teams

ArgoCon 2024
Salt Lake City

tenant
environments

argo-cd

resources

argo-cd

server

operations
teams

ArgoCon 2024
Salt Lake City

multi-1

multi-2

tenant
environments

argo-cd

resources

argo-cd

server

engineering
teams

operations
teams

ArgoCon 2024
Salt Lake City

Design Goal

  1. Development - Production parity (12 factor X). Have the same procedures and API.
  2. Introduce Tenant Environment Lifecycle management (12 factor XII) for creating, updating and decommissioning tenants.
  3. Use off the shelf software to provide a UI for troubleshooting.​

ArgoCon 2024
Salt Lake City

multi-1

multi-2

tenant
environments

argo-cd

resources

argo-cd

server

ArgoCon 2024
Salt Lake City

multi-1

multi-2

tenant
environments

argo-cd

resources

argo-cd

server

tenant
manager

API

backstage

CI/CD

CRM

lookup

API

first

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

tenant
environments

argo-cd

resources

argo-cd

server

tenant
manager

API

backstage

CI/CD

CRM

pub/sub

Arfo WF + E

modernising

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lfcyc-1

lfcyc-2

tenant
environments

argo-cd

resources

argo-cd

server

tenant
manager

API

pub/sub

Hooking in Multi Tenant

making the lifecycle available

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

No custom CRD

just reusing job kubernetes resource spec

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job
  namespace: my-namespace
  annotations:
    argocd.argoproj.io/sync-options: Force=true,Replace=true
    collibra.com/tenant.lifecycle: "lifecycle.upsert,lifecycle.migrate-in"
spec:
  suspend: true # this makes the job suspended
  template:
    spec:
      containers:
        - name: my-container

No custom CRD

custom annotations and suspend

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

No custom CRD

gives teams control, reuse resources

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

multi-3

Discovery

auto discovery, teams can introduce it anytime

ArgoCon 2024
Salt Lake City

[
  {
    "name":"lifecycle-1",
    "namespace":"namespace1"
  },
  {
    "name":"lifecycle-2",
    "namespace":"namespace2"
  }
]

E

WF

lookup

multi-1

multi-2

multi-3

Discovery

expands the Argo workflow

ArgoCon 2024
Salt Lake City

withParam: "{{`{{steps.discovery.outputs.parameters.jobs}}`}}"

E

WF

lookup

multi-1

multi-2

lfcyc-2

multi-3

lfcyc-3

Bridge

a bridge per discovery job spec

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

ArgoCon 2024
Salt Lake City

Bridge

bridge in wf namespace, pod in job spec ns

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

LIFECYCLE_TENANT_ENV_ID=799a8abb-653a-41c9-bec7-a47dba840eba
LIFECYCLE_BATCH_OPERATION_ID=01931b34-6589-7112-83ea-c6640f3f3ac1
LIFECYCLE_OPERATION_NAME=lifecycle.upsert
LIFECYCLE_TENANT_ENV_FILE=/mount/point/environment.yaml
LIFECYCLE_CHANGESET_FILE=/mount/point/changeset.yaml

ArgoCon 2024
Salt Lake City

Lifecycle Container

files are mounted, envvar's injected

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

ArgoCon 2024
Salt Lake City

Bridge <> Lifecycle Container

logs are streamed back to make the visible

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

ArgoCon 2024
Salt Lake City

Lifecycle Container

our ansible/tf code like any multi tenant service

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

New - migrate out

migrating out of regions, inject object storage

annotations:
	collibra.com/tenant.lifecycle: "lifecycle.migrate-out"

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

New - migrate in

the other side, import from object storage

annotations:
	collibra.com/tenant.lifecycle: "lifecycle.migrate-in"

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

Admin Processes (XII)

reuse for enabling remote admin

pub/sub

annotations:
	collibra.com/tenant.lifecycle: "team.operation"

ArgoCon 2024
Salt Lake City

E

WF

lookup

multi-1

multi-2

lifecycle-1

lfcyc-2

multi-3

lfcyc-3

namespace
lifecycle

argo-cd

server

Future

full containerisation

ArgoCon 2024
Salt Lake City

Discovery - Bridge

  • Single small command-line program (Go)
  • Born out necessity: Argo Workflows doesn't execute jobs in other namespaces
  • Interacts with k8s api to read job spec, create pods, waits for the exits code and read logs
  • Discovery outputs JSON that is picked up by the workflow to create parallel branches

ArgoCon 2024
Salt Lake City

Result

  • We have an API to manage our thousands of tenant environments (same prod/dev)
  • Under the hood Argo Events and Workflow, still only job templates are exposed to teams for managing lifecycles​, giving them enough control
  • Argo CD gitops flows are exposed for teams developing multi tenant services, including the means to distribute the job templates

ArgoCon 2024
Salt Lake City

ArgoCon 2024
Salt Lake City

https://slides.com/alexvanboxel/tenant_management_argocon_na2024/

https://alexvanboxel.name/

coordinates:

Global Tenant Management using Argo

By Alex Van Boxel

Global Tenant Management using Argo

For the Collibra data intelligence platform, a tenant is a combination of applications provisioned on a mixture of virtual machines and single—and multi-tenant Kubernetes services. This combination makes tenant lifecycle management non-trivial. In this session, we’ll explore using Argo Events, Workflows, and CD to build a global tenant management system. The system supports tenant lifecycle events like create, update, suspend, backup, and restore across various application types and infrastructure without exposing Argo-specific constructs to the application teams. This abstraction, called bridged workflows, allows teams to hook into the lifecycles in a simple Kubernetes native way, providing operations and developers the same simple participation in the global tenant management lifecycle. The case study will give you ideas for building a global tenant management system using the Argo suite.

  • 98