Kyle Rockman
Lead Infrastructure Engineer OpsLevel.com
Mission
Go with the Community
Trust but Verify
Freedom with Responsibility
Publish then Iterate then Automate
Mission
Go with the Community
When faced with a technology decision, we should always default to the most common decision in the open source community, leveraging the experience of many over the experience of few.
Mission
Trust but Verify
We assume proficiency and good faith in our team. We expect to build in controls and processes that allow us to observe the system at all times, and introspect the system when necessary.
Mission
Freedom with Responsibility
Engineering teams are entrusted with a core responsibility to support the business and because we trust our teams they have freedom to operate to satisfy their mission without the need for gatekeepers
Mission
Publish then Iterate then Automate
Build the smallest possible solution and iterate on it to ensure that we meet our needs. When the solution has matured enough make sure we automate away the mundane tasks
UACF Trail Map
Develop
Deliver
Observe
Improve
The Cloud Native landscape has a large number of options. This trail map is the recommended process for leveraging the platform provided by the infrastructure team
UACF Trail Map
Common Requirements
Develop
The process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications
Commit
The prevailing tool for code revision tracking is git and here at UA Digital we push our git commits to Github Cloud.
We expect the chosen tool to have/be:
Status: Gold
Build
The prevailing tool for building a cloud native service is to use containerization and here at UA Digital we use Docker.
Status: Gold
We expect the chosen tool to have/be:
Test
The prevailing tool for running testing flows is docker-compose. Here at UA Digital we have seen the power and flexibility that has come with running tests across a set of cloud native services by configuring your tests to run as a series of containers in a docker-compose file.
Status: Gold
We expect the chosen tool to have/be:
Artifacts (Docker)
We expect the chosen tool to have/be:
Status: Beta
The tool used at UA Digital for docker container management is Google Container Registry.
Artifacts (Non-Docker)
We expect the chosen tool to have/be:
Status: Re-evaluating
The existing tool used at UA Digital for artifact management is Artifactory. While we have seen good adoption of this tool we feel it has a few short comings and are interesting at looking into alternatives.
Deliver
All of the activities that make a software system available for use.
Provision
Status: Gold
The prevailing tool for provisioning cloud native infrastructure has taken the world by storm. Here at UA Digital we leverage Kubernetes and its declarative manifests to provision a place for you to run your cloud native service.
We expect the chosen tool to have/be:
Configure
Status: Beta
When considering how to configure your cloud native service you have to think in terms of 2 worlds - settings and secrets.
Kubernetes natively provides solutions for each of these things but here at UA Digital we have looked specifically at the secrets story provided by kubernetes and found it lacking. We are currently exploring Vault as an alternative.
Non-secret configuration:
Secret configuration:
Rollout
Status: Alpha
Here at UA Digital we have many different needs for rolling out services but also need to keep an eye on "what happens if an entire Kubernetes cluster needs to be rebuilt"? We are currently testing Flux as the way to rollout Kubernetes changes
We expect the chosen tool to have/be:
Observe
Maintain regular surveillance over something and register it as being significant
Logs
Status: Re-Evaluating
Often the first thing to observe is logs emission in realtime manner. Here at UA Digital we recommend using a Stern. As your service grows and becomes more complex it becomes easier to search through your logs for a specific thing. For this we are re-evaluating the way we currently provide this in our platform.
We expect the chosen tool to have/be:
Stats/Metrics
Status: Re-Evaluating
?
Traces
To increase observability across the distributed system here at UA Digital we have partnered with Lightstep a service that helps display the request paths throughout our systems called "Traces".
We expect the chosen tool to have/be:
Status: Beta
Alerting
Status: Re-Evaluating
?
Exceptions
We expect the chosen tool to have/be:
Finally to round out the observability pillars here at UA Digital we want to provide a way to catch exceptions thrown by your service. The current tool for this is Sentry but is under re-evaluation
Status: Re-Evaluating
Improve
To make or become better.
Production Readiness
When you are confident your service is working its time to get starting on making your service better
To the end we have partnered with Ops Level a service that helps UA Digital track the production readiness level of our fleet of services.
We expect the chosen tool to have/be:
Status: Beta
Resiliency
We expect the chosen tool to have/be:
As part of making sure your service is resilient in the face of the chaos that being cloud native brings. Here at UA Digital we have partnered with Gremlin a tool to perform chaos testing.
Status: Gold
Pipelining
The prevailing tool for automating workflow flows in the past was Jenkins and we have a several clusters here at UA Digital.
As we move towards a more cloud native ecosystem we are re-evaluating the tools we use for automation.
We expect the chosen tool to have/be:
Prevailing Options:
Status: Re-evaluating
Tools Overview
Here is a catalog of all the tools and services that make up the platform the infrastructure team provides
Source Code
Choice: Github Cloud
URL: http://github.uacf.io
SaaS: Yes
Status: Gold
Automation
Choice: Jenkins
URL: https://jenkins.uacf.io
SaaS: No
Status: Under Re-evaluation
Artifacts (Docker)
Choice: Google Cloud Registry
URL: https://gcr.io/ua-digital
SaaS: Yes
Status: Beta
Artifacts (Non-Docker)
Choice: Artifactory
URL: https://artifactory.uacf.io
SaaS: No
Status: Under Re-evaluation
Secrets
Choice: Vault
URL: https://vault.uacf.io
SaaS: No
Status: Beta
Provision
Choice: Kubernetes
URL: https://kubernetes.uacf.io
SaaS: No
Status: Gold
Rollout
Choice: Flux
URL: N/A
SaaS: No
Status: Alpha
Logging
Choice: Elasticsearch
URL: https://elasticsearch.uacf.io
SaaS: No
Status: Under Re-evaluation
Stats/Metrics
Choice: Grafana Labs
URL: https://grafana.uacf.io
SaaS: SAAS
Status: Under Re-evaluation
Traces
Choice: Lightstep
URL: https://app.lightstep.com
SaaS: Yes
Status: Gold
Alerting
Choice: Grafana
URL: https://grafana.uacf.io
SaaS: No
Status: Under Re-evaluation
Exceptions
Choice: Sentry
URL: https://sentry.uacf.io
SaaS: No
Status: Under Re-evaluation
Production Readiness
Choice: Ops Level
URL: https://app.opslevel.com
SaaS: Yes
Status: Beta
Resiliency
Choice: Gremlin
URL: https://app.gremlin.com
SaaS: Yes
Status: Gold
VPN
Choice: Open VPN
URL: https://vpn.uacf.io
SaaS: No
Status: Under Re-evaluation
Remote Access
Choice: SSH
URL: N/A
SaaS: No
Status: Under Re-evaluation
By Kyle Rockman
This slide deck covers the Under Armour infrastructure past, present and future states.