West LA DevOps
June 13, 2019
meetup.com/West-LA-DevOps
Agenda
- Job Board
- Industry Updates
- Talk #1: Geodesic Cloud Automation Shell
- Talk #2: Prometheus: how we ditched our legacy monitoring systems
About GumGum
- Computer Vision company
- Advertising division
- Context-aware ads
- Brand safety technology
- Sports division
>20M RPM
Online Advertising
Did you know?
GumGum Invented In-Image advertising in 2008
GumGum Sports
Job Board
- DevOps Engineer @ GumGum
- Your company?
Industry Updates
Since the last time we met..
1) Ubuntu 14.04 EOL
- No more updates as of April 30, 2019
- Original release: April 2014
- Did you know? Versions are YY.MM.
2) Shopify tests Istio
- Benchmarking Istio & Linkerd CPU by Michael Kipper
- Shopify was working on deploying Istio as our service mesh. But they hit a wall: cost
- From Istio's docs: "As of Istio 1.1, a proxy consumes about 0.6 vCPU per 1000 requests per second."
- This equated to 1,200 cores for the proxy alone, per million requests per second which, in GCP, would cost Shopify $50k/month/1MRPS
- Istio control plane: ~750 mcores
- Linkerd control plane: ~22 mcores
3) Stack Overflow breach
- On May 5, 2019 attackers managed to access the development tier for stackoverflow.com using a bug deployed the same day
- The hackers spent 5 days exploring and then escalated their access to the production systems
- Internal investigation revealed the attackers obtained names, email addresses and IP addresses of Stack Exchange users
- Stack Overflow has contracted a third-party forensics and incident response firm to assist its investigation, and says it’s resetting passwords and taking other “precautionary measures” in response to the incident
4) Intel ZombieLoad exploit
- CPU hardware exploit similar to Meltdown and Spectre
- Allows arbitrary in-flight data from CPU-internal buffers (Line Fill Buffers, Load Ports, Store Buffers), including data never stored in CPU caches
- According to HN, Intel attempted to play down the issue by trying to award the researchers with the $40k tier reward and a separate $80k reward as a "gift" (which the researchers kindly denied) instead of the maximum $100k reward for finding a critical vulnerability (source)
- Check out mdsattacks.com for the attack details
5) DockerHub breach
- On April 25, 2019 DockerHub detected a brief period of unauthorized access to a production database
- Sensitive data from ~190k accounts could have been exposed (<5% of total users)
- Leaked data includes usernames and hashed passwords for a small percentage of these users, as well as GitHub and Bitbucket tokens for autobuilds
- If you use Docker Hub autobuilds, please check if your GitHub/BitBucket API tokens have been used to push unexpected changes to your integrated repos
5) DockerHub breach
- DNS outage for over 2 hours (19:43 -22:35 UTC)
- Caused by a migration from legacy DNS system to Azure DNS
- Affected many Microsoft services
- SQL servers, Azure Postgres, Storage, Azure Active Directory among services rendered unused
- Outage affected all regions and availability zones, being region-redundant would not have helped
6) Azure outage
6) Azure outage
- Iteration Construct: for operator introduced
- Type system: allows for complex types and improves usage of data structures (nested maps and lists)
- First class expressions: removing need for string interpolation syntax i.e. "${aws_vpc.this.id}"
- Terraform team provides a migration script to ease migrating from 0.11 to 0.12
7) Terraform 0.12 Released
- Lyft, Uber, PagerDuty, Zoom, Pinterest are some of the well-known Tech IPOs in 2019
- Uncertainty in the market might be causing some VCs to want to cash out now
- Slack, Airbnb, Crowdstrike amongst those highly anticipated in this year as well
- Ride-sharing/Consumer apps have not faired well in this market, but B2B Enterprise SaaS has done very well
8) Tech IPO season
- "Two normally benign misconfigurations, and a specific software bug, combined to initiate the outage"
- Outage lasted for over four hours starting at 2:58 p.m EST
- Google services affected included Gmail, Youtube, Docs, Drive, and Hangouts
- Many users of GCP like Snap, Shopify, and Pokemon Go were also affected
- Cluster management software (Borg? K8s?) were accidentally included in a maintenance event causing many clusters to be de-scheduled
9) Google Cloud Outage
10) DigitalOcean kills startup
10) DigitalOcean kills startup
- @w3nicolas runs a startup that hosts their service on DO
- They periodically need to run a script requiring them to scale up to use more droplets to run the script in parallel
- DigitalOcean's automation flagged this as crypto mining
- The account was locked on two separate occasions, the first human Abuse Operations agent failed to flag the account as approved
- As with all automation, the account was flagged again, in which the second Abuse Operations agent denied access
- DigitalOcean reached out to the customer to apologize and offer more credits
Geodesic Cloud Automation Shell
The easy way to automate everything
By Erik Osterman, Cloud Architect @ Cloud Posse
Prometheus
How we ditched our legacy monitoring systems
By Florian Dambrine, MLOps Engineer @ GumGum
Next WLAD Meetup
Date: mid-August 2019
Next Meetup Preview
- Continuously delivering Terraform
- ?? Maybe you ??
Getting Involved
- Have a handy tip?
- Want to speak at WLAD?
Email us: westladevops@gmail.com
...or message us on Meetup
...or talk to us right now.
West LA DevOps: The 2nd Meetup
By Corey Gale
West LA DevOps: The 2nd Meetup
- 1,017