Intel and ARM, let Kubernetes rule them all!

A Swedish-speaking second-year Upper Secondary School (High School) Student from Finland

A person that has never attended a computing class :)

A maintainer of Kubernetes since a year back

The "kubernetes-on-arm" guy

$ whoami

I worked on and maintained minikube in the early days of the project, until I...

However, I wasn't satisfied with a side-project, I wanted it in core, so I implemented multiarch support for Kubernetes in the Spring 2016. I also wrote a multi-platform proposal

My first open source project was kubernetes-on-arm. It was the first easy solution to run Kubernetes on Raspberry Pi's

...moved on to kubeadm in August 2016, and started focusing on SIG-Cluster-Lifecycle issues, which I find very interesting and challenging.

What have I been tinkering with?

Why?

Motivation and reasoning

Platform agnostic. The specifications developed will not be platform specific such that they can be implemented on a variety of architectures and operating systems.

 -- CNCF Values

Why is the multi-platform functionality important for Kubernetes long-term?

$ kubectl motivate multiplatform

1. We don't know which platform will be the dominating one in 20 years from now

2. By letting new architectures join the project, and more people with them, we'll see a stronger ecosystem and a sound competition.

3. The risk of vendor lock-in on the default platform is significantly reduced

What could Kubernetes on ARM be used for right now?

KubeCloud: A Small-Scale Tangible Cloud Computing Environment

 - A master's thesis about educating Kubernetes' concepts by letting the students use Kubernetes on small Raspberry Pi clusters.

Microsoft Pledges to Use ARM Server Chips, Threatening Intel's Dominance

- The world's first 10nm processor is an ARM processor, exciting times!

In classrooms -- learning others how Kubernetes works by using Raspberry Pi's is the ideal way of letting newcomers actually see what it's all about

Since kubeadm was announced, it has been super-easy to set up Kubernetes in an official way on ARM and now also on ppc64le and s390x

Example setup on an ARM machine:

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ apt-get update && apt-get install -y docker.io kubeadm

$ kubeadm init
...

$ kubectl apply -f https://git.io/weave-kube-1.6

$ # DONE!

TL;DR; Kubernetes shouldn't have different install paths for different platforms, it should just work out-of-the-box

How can I set up Kubernetes on an other architecture?

Oh, wow, how does that work under the hood?

Quick intro on cross-compiling
and manifest lists

Kubernetes releases server binaries for all supported architectures (amd64, arm, arm64, ppc64le, s390x) and node binaries for all supported platforms (+windows/amd64)

All docker images in the core k8s repo are built and pushed for all architectures using a semi-standardized Makefile.

Debian packages are provided for all architectures as well, basically just downloads the binaries and makes debs of them

kubeadm is aware of which architecture it's running on on init and generates manifests for the right architecture.

How does it work under the hood?

Binaries and docker images released by Kubernetes are cross-compiled and cross-built for non-amd64 architectures.

$ # Cross-compile main.go to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=0 go build main.go
$ # Cross-compile main.go (which contains CGO code) to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=1 CC=arm-linux-gnueabihf go build main.go

Cross-compilation with Go is relatively easy. Cross-building is a little bit harder, one may have to use QEMU to emulate another arch:

$ # Cross-build an armhf image with a RUN command that is executed on an amd64 host
$ cat Dockerfile
FROM armhf/debian:jessie
COPY qemu-arm-static /usr/bin/
RUN apt-get install iptables nfs-common
COPY hyperkube /

$ # Register the binfmt_misc module in the kernel and download QEMU
$ docker run --rm --privileged multiarch/qemu-user-static:register --reset
$ curl -sSL https://foo-qemu-download.com/x86_64_qemu-arm-static.tar.gz | tar -xz

$ docker build -t gcr.io/google_containers/hyperkube-arm:v1.x.y .

A quick recap on cross-compiling and cross-building

I don't want to have the architecture in the image name!!

Me neither. Enter manifest lists.

Imagine this scenario...

$ go build my-cool-app.go
$ docker build -t luxas/my-cool-app-amd64:v1.0.0 .
...
$ docker push luxas/my-cool-app-amd64:v1.0.0

$ # ARM
$ GOARCH=arm go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm:v1.0.0 .
...
$ docker push luxas/my-cool-app-arm:v1.0.0

$ # ARM 64-bit
$ GOARCH=arm64 go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm64:v1.0.0 .
...
$ docker push luxas/my-cool-app-arm64:v1.0.0

Then you get excited and create a k8s cluster of amd64, arm and arm64 nodes

and try to run your application on that cluster. But what architecture should you use?

$ kubectl run --image luxas/my-cool-app-???:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80

This the hardest problem with a multi-platform cluster, if you hardcode the architecture here, it will fail on all other machines. Ideally I would like to do this:

$ kubectl run --image luxas/my-cool-app:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80

Fortunately, that's totally possible!

"Manifest list" is currently a Docker registry and client feature only, but I hope the general idea can propagate to other CRI implementations in the future.

The idea is very simple, you have one tag (e.g. luxas/my-cool-app:v1.0.0) that serves as a "redirector" to platform-specific images. The client will then download the right image digest based on what platform it's running on.

Ok, so now that I know what a manifest list is, how do I create it?

$ go build my-app.go
$ docker build -t luxas/my-app-amd64:v1.0.0 .
...
$ docker push luxas/my-app-amd64:v1.0.0

$ # ARM
$ GOARCH=arm go build my-app.go
$ docker build -t luxas/my-app-arm:v1.0.0 .
...
$ docker push luxas/my-app-arm:v1.0.0

$ # ARM 64-bit
$ GOARCH=arm64 go build my-app.go
$ docker build -t luxas/my-app-arm64:v1.0.0 .
...
$ docker push luxas/my-app-arm64:v1.0.0

$ wget https://github.com/estesp/manifest-tool/releases/download/v0.4.0/manifest-tool-linux-amd64
$ mv manifest-tool-linux-amd64 manifest-tool && chmod +x manifest-tool
$ export PLATFORMS=linux/amd64,linux/arm,linux/arm64
$ ./manifest-tool push from-args \
    --platforms $PLATFORMS \ # Which platforms the manifest list include
    --template luxas/my-app-ARCH:v1.0.0 \ # ARCH is a placeholder for the real architecture
    --target luxas/my-app:v1.0.0 # The name of the resulting manifest list

v1.2:

 - The first release I participated in, I made the release bundle include ARM 32-bit binaries

v1.3:

 - Server docker images are released for ARM, both 32 and 64-bit

 - kubelet chooses the right pause image and registers itself with beta.kubernetes.io/{os,arch}

v1.4:

 - kubeadm released as an official deployment method that supports ARM 32 and 64-bit

 - Unfortunately, I had to use a patched Golang version for building ARM 32-bit binaries...

v1.6:

 - The patched Golang version for ARM could be removed.

 - I reenabled ppc64le builds and the community contributed s390x builds.

How has the Kubernetes road to multiarch been?

Demo!

Set up a cluster consisting of

2x Up Board

2x Odroid C2

3x Raspberry Pi 3

With kubeadm this gets easy

KUBE_HYPERKUBE_IMAGE=luxas/hyperkube:v1.6.0-kubeadm-workshop-2 kubeadm-new init --config kubeadm.yaml

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

kubectl apply -f weave.yaml

kubectl taint no pi5 beta.kubernetes.io/arch=arm64:NoSchedule

kubectl taint no pi6 pi7 beta.kubernetes.io/arch=arm:NoSchedule

# Create the Dashboard Deployment and Service
kubectl apply -f demos/dashboard/dashboard.yaml

# Create the Heapster Deployment and Service
kubectl apply -f demos/monitoring/heapster.yaml

# Deploy Traefik as the Ingress Controller and use Ngrok to 
# expose the Traefik Service to the Internet
kubectl apply -f demos/loadbalancing/traefik-common.yaml
kubectl apply -f demos/loadbalancing/traefik-ngrok.yaml

# Expose the Dashboard to the world
kubectl apply -f demos/dashboard/ingress.yaml

# Get the public ngrok URL
curl -sSL $(kubectl -n kube-system get svc ngrok -o template --template \
    "{{.spec.clusterIP}}")/api/tunnels | jq  ".tunnels[].public_url" | sed 's/"//g;/http:/d'

# Create InfluxDB and Grafana for the saving the Heapster data
kubectl apply -f demos/monitoring/influx-grafana.yaml

# Create the Prometheus Operator, a Prometheus instance and a sample metrics app
kubectl apply -f demos/monitoring/prometheus-operator.yaml
kubectl apply -f demos/monitoring/sample-prometheus-instance.yaml

# Create a Custom Metrics API server
kubectl apply -f demos/monitoring/custom-metrics.yaml
$ kubectl get no -owide
NAME       STATUS     AGE       VERSION                                    EXTERNAL-IP   OS-IMAGE                        KERNEL-VERSION
pi5        Ready      42m       v1.6.0-beta.4                              <none>        Debian GNU/Linux 8 (jessie)     4.9.13-bee42-v8
pi6        Ready      43m       v1.6.0-beta.4                              <none>        Raspbian GNU/Linux 8 (jessie)   4.4.50-hypriotos-v7+
pi7        NotReady   43m       v1.6.0-beta.4                              <none>        Raspbian GNU/Linux 8 (jessie)   4.4.50-hypriotos-v7+
upboard1   Ready      46m       v1.7.0-alpha.0.1446+33eb8794c93d5b-dirty   <none>        Ubuntu 16.04.2 LTS              4.4.0-67-generic
upboard2   NotReady   43m       v1.6.0-beta.4                              <none>        Ubuntu 16.04.2 LTS              4.4.0-66-generic

$ kubectl get po --all-namespaces -owide
NAMESPACE        NAME                                                   READY     STATUS    RESTARTS   AGE       IP                NODE
custom-metrics   custom-metrics-apiserver-2410399496-j6dg1              1/1       Running   0          22m       10.47.0.2         pi6
default          prometheus-operator-1505754769-n7kj8                   1/1       Running   0          30m       10.44.0.4         upboard2
default          prometheus-sample-metrics-prom-0                       2/2       Running   0          29m       10.44.0.8         upboard2
default          sample-metrics-app-2440858958-1h5wf                    1/1       Running   0          1m        10.45.0.6         pi5
default          sample-metrics-app-2440858958-35fdz                    1/1       Running   0          1m        10.44.0.11        upboard2
default          sample-metrics-app-2440858958-56r2x                    1/1       Running   0          1m        10.44.0.9         upboard2
default          sample-metrics-app-2440858958-9grc1                    1/1       Running   0          29m       10.45.0.4         pi5
default          sample-metrics-app-2440858958-f5w1t                    1/1       Running   0          4m        10.45.0.5         pi5
default          sample-metrics-app-2440858958-km3gq                    1/1       Running   0          12m       10.47.0.3         pi6
default          sample-metrics-app-2440858958-lntqp                    1/1       Running   0          1m        10.47.0.5         pi6
default          sample-metrics-app-2440858958-nst8h                    1/1       Running   0          4m        10.47.0.4         pi6
kube-system      etcd-upboard1                                          1/1       Running   0          44m       192.168.200.211   upboard1
kube-system      heapster-57121549-mtx6f                                1/1       Running   0          41m       10.44.0.2         upboard2
kube-system      kube-dns-3913472980-l3rkl                              3/3       Running   0          44m       10.32.0.2         upboard1
kube-system      kube-proxy-0jwxh                                       1/1       Running   0          42m       192.168.200.215   pi5
kube-system      kube-proxy-7ks9n                                       1/1       Running   0          45m       192.168.200.211   upboard1
kube-system      kube-proxy-ktxqd                                       1/1       Running   0          43m       192.168.200.212   upboard2
kube-system      kube-proxy-snp6v                                       1/1       Running   0          43m       192.168.200.216   pi6
kube-system      kubernetes-dashboard-2731141917-rdbj2                  1/1       Running   0          41m       10.44.0.1         upboard2
kube-system      monitoring-grafana-4071825559-rbs3w                    1/1       Running   0          34m       10.45.0.2         pi5
kube-system      monitoring-influxdb-1373127269-pzwhx                   1/1       Running   0          34m       10.45.0.3         pi5
kube-system      ngrok-3984100120-f5900                                 1/1       Running   0          41m       10.44.0.3         upboard2
kube-system      pv-controller-manager-3769581161-dcn66                 1/1       Running   0          40m       10.47.0.1         pi6
kube-system      self-hosted-kube-apiserver-kk6hk                       1/1       Running   1          45m       192.168.200.211   upboard1
kube-system      self-hosted-kube-controller-manager-1546170996-40n6g   1/1       Running   0          45m       192.168.200.211   upboard1
kube-system      self-hosted-kube-scheduler-3991062876-6s94c            1/1       Running   1          45m       192.168.200.211   upboard1
kube-system      traefik-ingress-controller-3665677306-f5dhj            1/1       Running   0          41m       10.45.0.1         pi5
kube-system      weave-net-3h3xm                                        2/2       Running   0          42m       192.168.200.215   pi5
kube-system      weave-net-f7wwj                                        2/2       Running   0          43m       192.168.200.212   upboard2
kube-system      weave-net-kxcr2                                        2/2       Running   0          43m       192.168.200.216   pi6
kube-system      weave-net-n3tvh                                        2/2       Running   0          44m       192.168.200.211   upboard1
wardle           wardle-apiserver-3982025089-3grzx                      2/2       Running   0          32m       10.44.0.10        upboard2
$ kubectl get svc --all-namespaces
NAMESPACE        NAME                         CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
custom-metrics   api                          10.100.246.198   <none>        443/TCP          35m
default          kubernetes                   10.96.0.1        <none>        443/TCP          52m
default          prometheus-operated          None             <none>        9090/TCP         35m
default          sample-metrics-app           10.97.141.133    <none>        8080/TCP         35m
default          sample-metrics-prom          10.105.118.16    <nodes>       9090:30999/TCP   35m
kube-system      heapster                     10.107.113.203   <none>        80/TCP           48m
kube-system      kube-dns                     10.96.0.10       <none>        53/UDP,53/TCP    52m
kube-system      kubernetes-dashboard         10.99.233.145    <none>        80/TCP           48m
kube-system      monitoring-grafana           10.105.105.151   <none>        80/TCP           44m
kube-system      monitoring-influxdb          10.99.193.162    <none>        8086/TCP         44m
kube-system      ngrok                        10.107.224.120   <none>        80/TCP           48m
kube-system      traefik-ingress-controller   10.102.162.4     <none>        80/TCP           48m
kube-system      traefik-web                  10.109.90.245    <none>        80/TCP           48m
wardle           api                          10.99.51.75      <none>        443/TCP          39m

$ kubectl top node
NAME       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
pi5        352m         8%        487Mi           56%       
upboard1   428m         10%       984Mi           75%       
pi6        414m         10%       449Mi           58%       
$ kubectl api-versions
apiregistration.k8s.io/v1alpha1
apps/v1beta1
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1
authorization.k8s.io/v1beta1
autoscaling/v1
autoscaling/v2alpha1
batch/v1
batch/v2alpha1
certificates.k8s.io/v1beta1
custom-metrics.metrics.k8s.io/v1alpha1
extensions/v1beta1
monitoring.coreos.com/v1alpha1
policy/v1beta1
rbac.authorization.k8s.io/v1alpha1
rbac.authorization.k8s.io/v1beta1
rook.io/v1beta1
settings.k8s.io/v1alpha1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1
wardle.k8s.io/v1alpha1

$ kubectl apply -f demos/sample-apiserver/my-flunder.yaml 
flunder "my-first-flunder" configured

$ kubectl get flunders
NAME               KIND
my-first-flunder   Flunder.v1alpha1.wardle.k8s.io
$ curl -sSLk https://10.100.246.198/apis/custom-metrics.metrics.k8s.io/v1alpha1\
            /namespaces/default/services/sample-metrics-app/http_requests_total
{
  "kind": "MetricValueList",
  "apiVersion": "custom-metrics.metrics.k8s.io/v1alpha1",
  "metadata": {},
  "items": [
    {
      "describedObject": {
        "kind": "Service",
        "namespace": "default",
        "name": "sample-metrics-app",
        "apiVersion": "/__internal"
      },
      "metricName": "http_requests_total",
      "timestamp": "2017-03-24T13:14:13Z",
      "window": 60,
      "value": "299m"
    }
  ]
}

$ kubectl get hpa
NAME                     REFERENCE                       TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
sample-metrics-app-hpa   Deployment/sample-metrics-app   333m / 100   2         10        10         31m

What now?

Roadmap and help-wanted issues

The current situation is ok and works, but it could obviously be improved. Here are some shout-outs to the community:

- Automated CI testing for the other architectures using kubeadm

    - We might be able to use the CNCF cluster here?

- Formalize a standard specification for how Kubernetes binaries should be compiled and how server images should be built

    - Official Kubernetes projects should publish binaries for at least             amd64, arm, arm64, ppc64le, s390x and windows (node only)

- Manifest lists should be built for the server images

    - This is blocked on gcr.io not supporting v2 schema 2 :(

- Implement this feature in other CRI-compliant implementations

- Creating an external Admission Controller that applies platform data 

What's yet to be done here?

Thank you for listening!

github.com/luxas

twitter.com/kubernetesonarm

luxaslabs.com

 

Bochum Presentation

By lucask

Bochum Presentation

  • 2,528