Infrastructure: State of the Union

Mission

Go with the Community

Trust but Verify
Freedom with Responsibility

Publish then Iterate then Automate

 

Mission

Go with the Community

When faced with a technology decision, we should always default to the most common decision in the open source community, leveraging the experience of many over the experience of few.

Mission

Trust but Verify

We assume proficiency and good faith in our team. We expect to build in controls and processes that allow us to observe the system at all times, and introspect the system when necessary.

Mission

Freedom with Responsibility

Engineering teams are entrusted with a core responsibility to support the business and because we trust our teams they have freedom to operate to satisfy their mission without the need for gatekeepers

Mission

Publish then Iterate then Automate

Build the smallest possible solution and iterate on it to ensure that we meet our needs. When the solution has matured enough make sure we automate away the mundane tasks

UACF Trail Map

Develop

Deliver
Observe
Improve

The Cloud Native landscape has a large number of options. This trail map is the recommended process for leveraging the platform provided by the infrastructure team

UACF Trail Map

Common Requirements

  • Self service
  • Great documentation
  • Low maintenance & easy to keep up to date
  • Easy to understand & use
  • Easy to integrate with existing tools
  • High Availability
  • SAML

Develop

The process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications

Commit

The prevailing tool for code revision tracking is git and here at UA Digital we push our git commits to Github Cloud.

We expect the chosen tool to have/be:

  • Not self-hosted
  • Easy to understand & use
  • Easy to integrate with existing tools
  • SAML

Status: Gold

Build

The prevailing tool for building a cloud native service is to use containerization and here at UA Digital we use Docker.

 

Status: Gold

We expect the chosen tool to have/be:

  • Easy to understand & use
  • Great Documentation
  • Works with languages (java, scala, python)
  • Create a deployable artifact
  • Easy to integrate with pipelining tools

Test

The prevailing tool for running testing flows is docker-compose.  Here at UA Digital we have seen the power and flexibility that has come with running tests across a set of cloud native services by configuring your tests to run as a series of containers in a docker-compose file.

Status: Gold

We expect the chosen tool to have/be:

  • Easy to understand & use
  • Great Documentation
  • Works with languages (java, scala, python)
  • Easy to integrate with pipelining tools
  • Easy to recreate/rerun a test environment of a set of services

Artifacts (Docker)

We expect the chosen tool to have/be:

  • Easy to understand & use
  • SAML
  • UI
  • Availability globally

Status: Beta

The tool used at UA Digital for docker container management is Google Container Registry.

Artifacts (Non-Docker)

We expect the chosen tool to have/be:

  • Easy to understand & use
  • Great Documentation
  • Works with most languages package managers
  • Easy to integrate with pipelining tools

Status: Re-evaluating

The existing tool used at UA Digital for artifact management is Artifactory.  While we have seen good adoption of this tool we feel it has a few short comings and are interesting at looking into alternatives.

Deliver

All of the activities that make a software system available for use.

Provision

Status: Gold

The prevailing tool for provisioning cloud native infrastructure has taken the world by storm. Here at UA Digital we leverage Kubernetes and its declarative manifests to provision a place for you to run your cloud native service.

We expect the chosen tool to have/be:

  • Self Service
  • Cloud agnostic
  • Great documentation
  • Easy to understand & use
  • Clean separation of concerns
  • Flexibility for future integrations/changes

Configure

Status: Beta

When considering how to configure your cloud native service you have to think in terms of 2 worlds - settings and secrets.

Kubernetes natively provides solutions for each of these things but here at UA Digital we have looked specifically at the secrets story provided by kubernetes and found it lacking.  We are currently exploring Vault as an alternative.

Non-secret configuration:

  • kubernetes primitives (configmap).

Secret configuration:

  • Vault - https://vault.uacf.io

Rollout

Status: Alpha

Here at UA Digital we have many different needs for rolling out services but also need to keep an eye on "what happens if an entire Kubernetes cluster needs to be rebuilt"?  We are currently testing Flux as the way to rollout Kubernetes changes

We expect the chosen tool to have/be:

  • Easy to understand & use
  • Easy to integrate with existing tools
  • A history of the changes rolledout
  • A easy story for rolling back a bad change
  • Automated

Observe

Maintain regular surveillance over something and register it as being significant

Logs

Status: Re-Evaluating

Often the first thing to observe is logs emission in realtime manner.  Here at UA Digital we recommend using a Stern.  As your service grows and becomes more complex it becomes easier to search through your logs for a specific thing.  For this we are re-evaluating the way we currently provide this in our platform.

We expect the chosen tool to have/be:

  • Self Service
  • Easy to understand & use
  • Ability to easily search for specific things
  • Ability to handle disparate volumes of ingestion
  • UI

Stats/Metrics

Status: Re-Evaluating

?

Traces

To increase observability across the distributed system here at UA Digital we have partnered with Lightstep a service that helps display the request paths throughout our systems called "Traces".

We expect the chosen tool to have/be:

  • Self service
  • Great documentation
  • Easy to understand & use
  • Easy to integrate with existing tools/stats pipelines
  • High Availability
  • UI
  • Works with Open Tracing standard

Status: Beta

Alerting

Status: Re-Evaluating

?

Exceptions

We expect the chosen tool to have/be:

  • Self service
  • Great documentation
  • Easy to understand & use
  • High Availability
  • UI

Finally to round out the observability pillars here at UA Digital we want to provide a way to catch exceptions thrown by your service.  The current tool for this is Sentry but is under re-evaluation

Status: Re-Evaluating

Improve

To make or become better.

Production Readiness

When you are confident your service is working its time to get starting on making your service better

To the end we have partnered with Ops Level a service that helps UA Digital track the production readiness level of our fleet of services.

We expect the chosen tool to have/be:

  • Registry of workloads
  • Ability to attribute a score to a workload
  • Ability to automate checking score of a workload
  • UI

Status: Beta

Resiliency

We expect the chosen tool to have/be:

  • Self service
  • Not self-hosted
  • Easy to understand & use
  • Ability to integrate with workloads in kubernetes and outside kubernetes
  • Ability to schedule chaos testing
  • UI

As part of making sure your service is resilient in the face of the chaos that being cloud native brings.  Here at UA Digital we have partnered with Gremlin a tool to perform chaos testing.

Status: Gold

Pipelining

The prevailing tool for automating workflow flows in the past was Jenkins and we have a several clusters here at UA Digital.

As we move towards a more cloud native ecosystem we are re-evaluating the tools we use for automation.

We expect the chosen tool to have/be:

  • Integration with Github
  • Network access to UA services/tools 
  • Low maintenance & easy to update
  • Great Documentation
  • UI
  • Slack integration
  • Access to Secrets Stored in Vault

Prevailing Options:

  • Knative Build
  • Github Actions
  • Google Cloud Builder
  • Jenkins

Status: Re-evaluating

Tools Overview

Here is a catalog of all the tools and services that make up the platform the infrastructure team provides

Source Code

Choice: Github Cloud
URL: http://github.uacf.io

SaaS: Yes
Status: Gold

Automation

Choice: Jenkins

URL: https://jenkins.uacf.io

SaaS: No
Status: Under Re-evaluation

Artifacts (Docker)

Choice: Google Cloud Registry

URL: https://gcr.io/ua-digital

SaaS: Yes
Status: Beta

Artifacts (Non-Docker)

Choice: Artifactory

URL: https://artifactory.uacf.io

SaaS: No
Status: Under Re-evaluation

Secrets

Choice: Vault

URL: https://vault.uacf.io

SaaS: No
Status: Beta

Provision

Choice: Kubernetes

URL: https://kubernetes.uacf.io

SaaS: No
Status: Gold

Rollout

Choice:  Flux

URL: N/A

SaaS: No
Status: Alpha

Logging

Choice: Elasticsearch

URL: https://elasticsearch.uacf.io

SaaS: No
Status: Under Re-evaluation

Stats/Metrics

Choice: Grafana Labs

URL: https://grafana.uacf.io

SaaS: SAAS
Status: Under Re-evaluation

Traces

Choice: Lightstep

URL: https://app.lightstep.com

SaaS: Yes
Status: Gold

Alerting

Choice: Grafana

URL: https://grafana.uacf.io

SaaS: No
Status: Under Re-evaluation

Exceptions

Choice: Sentry

URL: https://sentry.uacf.io

SaaS: No
Status: Under Re-evaluation

Production Readiness

Choice: Ops Level

URL: https://app.opslevel.com

SaaS: Yes
Status: Beta

Resiliency

Choice: Gremlin

URL: https://app.gremlin.com

SaaS: Yes
Status: Gold

VPN

Choice: Open VPN

URL: https://vpn.uacf.io

SaaS: No
Status: Under Re-evaluation

Remote Access

Choice: SSH

URL: N/A

SaaS: No
Status: Under Re-evaluation

Infrastructure: State of the Union

By Kyle Rockman

Infrastructure: State of the Union

This slide deck covers the Under Armour infrastructure past, present and future states.

  • 650