West LA DevOps

June 13, 2019

meetup.com/West-LA-DevOps

Agenda

  1. Job Board
  2. Industry Updates
  3. Talk #1: Geodesic Cloud Automation Shell
  4. Talk #2: Prometheus: how we ditched our legacy monitoring systems

About GumGum

  • Computer Vision company
  • Advertising division
    • Context-aware ads
    • Brand safety technology
  • Sports division

>20M RPM

Online Advertising

Did you know?

GumGum Invented In-Image advertising in 2008

GumGum Sports

Job Board

  1. DevOps Engineer @ GumGum
  2. Your company?

Industry Updates

Since the last time we met..

1) Ubuntu 14.04 EOL

  • No more updates as of April 30, 2019
    • Original release: April 2014
  • Did you know? Versions are YY.MM.

2) Shopify tests Istio

 

  • Benchmarking Istio & Linkerd CPU by Michael Kipper
  • Shopify was working on deploying Istio as our service mesh. But they hit a wall: cost
  • From Istio's docs: "As of Istio 1.1, a proxy consumes about 0.6 vCPU per 1000 requests per second."
  • This equated to 1,200 cores for the proxy alone, per million requests per second which, in GCP, would cost Shopify $50k/month/1MRPS
  • Istio control plane: ~750 mcores
  • Linkerd control plane: ~22 mcores

3) Stack Overflow breach

  • On May 5, 2019 attackers managed to access the development tier for stackoverflow.com using a bug deployed the same day
  • The hackers spent 5 days exploring and then escalated their access to the production systems
  • Internal investigation revealed the attackers obtained names, email addresses and IP addresses of Stack Exchange users
  • Stack Overflow has contracted a third-party forensics and incident response firm to assist its investigation, and says it’s resetting passwords and taking other “precautionary measures” in response to the incident

4) Intel ZombieLoad exploit

  • CPU hardware exploit similar to Meltdown and Spectre
  • Allows arbitrary in-flight data from CPU-internal buffers (Line Fill Buffers, Load Ports, Store Buffers), including data never stored in CPU caches
  • According to HN, Intel attempted to play down the issue by trying to award the researchers with the $40k tier reward and a separate $80k reward as a "gift" (which the researchers kindly denied) instead of the maximum $100k reward for finding a critical vulnerability (source)
  • Check out mdsattacks.com for the attack details

5) DockerHub breach

  • On April 25, 2019 DockerHub detected a brief period of unauthorized access to a production database
  • Sensitive data from ~190k accounts could have been exposed (<5% of total users)
  • Leaked data includes usernames and hashed passwords for a small percentage of these users, as well as GitHub and Bitbucket tokens for autobuilds
  • If you use Docker Hub autobuilds, please check if your GitHub/BitBucket API tokens have been used to push unexpected changes to your integrated repos

5) DockerHub breach

  • DNS outage for over 2 hours (19:43 -22:35 UTC)
  • Caused by a migration from legacy DNS system to Azure DNS
  • Affected many Microsoft services
  • SQL servers, Azure Postgres, Storage, Azure Active Directory among services rendered unused
  • Outage affected all regions and availability zones, being region-redundant would not have helped

6) Azure outage

6) Azure outage

  • Iteration Construct: for operator introduced
  • Type system: allows for complex types and improves usage of data structures (nested maps and lists)
  • First class expressions: removing need for string interpolation syntax i.e. "${aws_vpc.this.id}"
  • Terraform team provides a migration script to ease migrating from 0.11 to 0.12 

7) Terraform 0.12 Released

  • Lyft, Uber, PagerDuty, Zoom, Pinterest are some of the well-known Tech IPOs in 2019
  • Uncertainty in the market might be causing some VCs to want to cash out now
  • Slack, Airbnb, Crowdstrike amongst those highly anticipated in this year as well
  • Ride-sharing/Consumer apps have not faired well in this market, but B2B Enterprise SaaS has done very well

8) Tech IPO season

  • "Two normally benign misconfigurations, and a specific software bug, combined to initiate the outage"
  • Outage lasted for over four hours starting at 2:58 p.m EST
  • Google services affected included Gmail, Youtube, Docs, Drive, and Hangouts
  • Many users of GCP like Snap, Shopify, and Pokemon Go were also affected
  • Cluster management software (Borg? K8s?) were accidentally included in a maintenance event causing many clusters to be de-scheduled

9) Google Cloud Outage

10) DigitalOcean kills startup

10) DigitalOcean kills startup

  • @w3nicolas runs a startup that hosts their service on DO
  • They periodically need to run a script requiring them to scale up to use more droplets to run the script in parallel
  • DigitalOcean's automation flagged this as crypto mining
  • The account was locked on two separate occasions, the first human Abuse Operations agent failed to flag the account as approved
  • As with all automation, the account was flagged again, in which the second Abuse Operations agent denied access
  • DigitalOcean reached out to the customer to apologize and offer more credits

Geodesic Cloud Automation Shell

The easy way to automate everything

By Erik Osterman, Cloud Architect @ Cloud Posse

Prometheus

How we ditched our legacy monitoring systems

By Florian Dambrine, MLOps Engineer @ GumGum

Next WLAD Meetup

Date: mid-August 2019

Next Meetup Preview

  • Continuously delivering Terraform
  • ?? Maybe you ??

Getting Involved

  • Have a handy tip?
  • Want to speak at WLAD?

 

Email us: westladevops@gmail.com

...or message us on Meetup

...or talk to us right now.