By Corey Gale
K8s v1.16
CRDs reach GA
Metrics overhaul
CSI enhancements
Resizing
Clone volumes
Inline volume support in beta (good for ephemeral attachments)
Can attach to a running pod
Example: tcpdump, filesystem inspection
Needs to be turned on
Requirements for graduation: adoption, maintainer diversity, project health
Vitess (graduated)
Cloud native DB, super scalable, reliable (5 9s)
35% of Slack is on Vitess, 100% by end of 2020
JD.com uses Vitness @ 35M QPS (30k pods, 4k keyspaces)
Jaeger (graduated)
Open Policy Agent (OPA) (incubating)
Decouples policy definitions and environment/enforcement
Flexible, fine-grained control across the stack
Side-car or host-level daemon
Declarative policy language: Rego
Etcd (incubating)
Can now scale up to 5000 node k8s clusters
NATS
Cloud-native messaging service
Scalable services and streams
Tinder used NATS to migrate poll workloads to push
Added Prometheus exporters & Grafana dashboards
FluentD, Kafka integrations
Goal: connect everything
ML: 5k+ pods, 10k+ cores
Ridesharing: 100k+ containers (sidecars), 50k+ cores
Lyft CNI stack requirements: VPC native, low latency, high throughput
No overlay network, very low IPvlan overhead
Envoy Manager (EM): side-cars connect to EM
Long term storage options: DynamoDB, Google Big Table, S3, Google Cloud Storage, Cassandra
Includes tools for auto-scaling LTS
What’s new?
Ingestors can ship blocks instead of chunks
Write-ahead logging for ingestors
Problem: we need more labeled data, but what kind of data exactly?
Solution: The Loop workflow (see slide shot)
Uses Sage Maker ground truths
Limit: etcd OOM’ing, fixed in etdc v3
~2300 nodes/cluster AirBnB’s max
Approach: workloads can be scheduled on any cluster
Weaveworks, Intuit, Palo Alto Networks (talk link)
Argo Flux
Weaveworks-Intuit-AWS collaboration
Microsoft & IBM (talk link)
Helm 3 announced. Major changes:
No more Tiller
Release "upgrade" strategy
Testing framework
Dependencies moved into manifest
Chart value validation
“3-way merge”
Considers old manifest, new manifest and current values (addresses manually updated values)
Releases stored as secrets in the same namespace as the release
Reddit scale:
500M+ monthly active users
16M+ posts, 2.8B+ votes per month
After: 1 cluster per AZ (3 clusters per region)
Cost and latency savings from silo’d AZs.
Mirrored clusters have prevented outages.
More clusters, more admin overhead.