GKE In Production

@joshkurz

Options To Run K8s

Why Run on GKE

  • Google Invented K8s
  • Google's cloud is pretty much centered around GKE
  • Automatic Upgrades
  • Fastest CVE upgrades on the market
  • Google's Global Networking
  • Built in Audit Logs
  • Built in Stackdriver Application Logs
  • Locked Down K8s Master APIs
  • Integration Options with rbac and Google Hosted IAM Access Tokens
  • Cheaper CPU costs over AWS CPU (without contractual discounts)
  • Project structure vs Account Structure
  • Container Registry
  • Google Support

Issues With GKE

  • Default Clusters are Not Production Ready
  • Networking complexities to bridge cloud and on prem networks
  • Familiarization with GCP is still not as well known as AWS or Azure
  • SSL is not as easy as ACM, but what is?
  • How to make auth easy for users and keep things secure?

Best Practices For GKE

  • Use shared networks for private clusters
  • Create small subnets for each cluster, Default firewall rules do not allow east west traffic from clusters in different subnets inside the same network.
  • Turn off security risk plugins, Dashboard, legacy auth, basic auth.
  • Manage the cluster and node pools separately. 
  • Use GCP Ingress Controller as much as possible and learn about it's features.
  • Create regional clusters over zonal clusters, unless you know that your workloads can only run in specific zones in a region. 
  • https://live.paloaltonetworks.com/t5/Community-Blog/Exploring-Google-Kubernetes-Engine-GKE-Security/ba-p/249971 

Shared Networks

  • A Very useful tool, to allow total control over networking that can be shared to individual projects
  • Access can be granted to individual users for individual subnets.
  • Requires Cloud NAT to be installed in the network. Only one NAT per network is needed. Two options for NAT. Auto Scaled or Self Managed. We opted for self managed, so we know which IPs our traffic is coming from. Downside is we have to add IPs if our egress gets too high and we start dropping packets. ~64k per IP
  • One Subnet Per GKE cluster. This does mean you should use as small amount of subnets as possible, so you don't run out IP space.
  • Use VPC Service Controls
  • Flow Logs per subnet and Network

Master API Security

  • Lock Down Master API to Trusted Networks or Don't let any networks hit the public IP address of the master and only communicate with the private IP address. 
  • Allows for Zero Trust methodology on k8s apis
  • Perform Master Credential Rotation, Recreates Master Ip at the same time

K8s Internal Security

  • Use firewalls between each GKE cluster in the same network. Do not allow GKE clusters to talk to other GKE clusters in your network at layer 4. Make them go to layer 7 to talk to other applications. 
  • Use Network Security Policies
  • Use Pod Security Policies
  • Using Istio automatic injector on all pods is another option to secure East/West traffic at a higher layer
  • For Node Security, Either Create a new service account with limited permissions for each cluster to use, or set node IAM permissions to be least privileged. 

GKE Rbac

  • Can use gcloud access token
  • gcloud config config-helper --format=json | jq -r .credential.access_token
  • Can take it a step further and create applications that can return access_tokens to users via API or local clients. Using google hosted google oidc  oidc application
  • Map users to namespaces via rbac. Automate this step.
  • create least privileged rolebindings that allow users to do only what is necessary
  • You get all audit logs by default

Node Pools

  • Create Clusters
  • Delete default node pool
  • Create node pools for specific use
  • use preemptive nodes for nonprod
  • Use Auto Upgrades
  • Use Auto Node Repair
  • Use AutoScaling
  • Opens up many options for having different types of workloads in the same cluster, via taints and tolerations and other k8s pod targeting systems

GCP Ingress Controller

  • Creates Layer 7 LoadBalancers
  • Create BackendConfig for each Service
  • Use Cloud Armor and Cloud IAP for Private Applications
  • Use Cloud CDN if needed
  • Set Timeouts, Session Affinity, and Connection Draining when applicable
  • Automatic StackDriver Metrics
  • Automatic StackDriver Logs on LB
  • You can still create internal or external network load balancers if that is a requirement via k8s service objects. You just miss all the goodies the layer7 lb brings.

Regional/Zonal Clusters

  • Regional Clusters automatically provision nodes in each zone in the region. This give HA by default.
  • Zonal Clusters can do the same as Regional Clusters, you just have to specify you wan to run your node pools in all zone. The benefits of using Zonal clusters is you have the ability to choose specific zones. Why would you not want to run in all zones in a region? Because not all node types are available in all zones. 

Project Structure / Container Registry

  • Create Projects Per Logical Group
  • Allows more open IAM access for each group, without concern of a team muddling up another teams resources
  • Create registries per team that can only be accessed by them and their clusters

GKE Beta/Alpha Features

  • Binary Authorization (beta)
  • Application-layer Secrets Encryption (beta)
  • Node auto-provisioning (beta)
  • GKE usage metering (beta)
  • Vertical Pod Autoscaling (beta)
  • Cloud Run on GKE (beta)
  • istio-on-gke (beta)
  • Managed Certs (beta) 1.12.6 or higher required
  • Alpha Clusters to get test latest k8s greatness
  • GKE Sandbox

GKE Gotchas

  • Do not change NodePort on services. Make sure when you apply a service yaml, you do not update the nodePort the service is already using. You will have downtime.
  • You cannot use a cluster master, node, Pod, or Service IP range that overlaps with 172.17.0.0/16.
  • While GKE can detect overlap with the cluster master address block, it cannot detect overlap within a shared VPC network.
  • Make sure your firewall rules allow for GCP healthchecks
  • Read the Docs https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#restrictions

Sources

@joshkurz

Production GKE

By joshkurz

Production GKE

An Overview of GKE Production Tips

  • 465