Intel and ARM, let Kubernetes rule them all!

A Swedish-speaking second-year Upper Secondary School (High School) Student from Finland

A person that has never attended a computing class :)

A maintainer of Kubernetes since a year back

The "kubernetes-on-arm" guy

$ whoami

I worked on and maintained minikube in the early days of the project, until I...

However, I wasn't satisfied with a side-project, I wanted it in core, so I implemented multiarch support for Kubernetes in the Spring 2016. I also wrote a multi-platform proposal

My first open source project was kubernetes-on-arm. It was the first easy solution to run Kubernetes on Raspberry Pi's

...moved on to kubeadm in August 2016, and started focusing on SIG-Cluster-Lifecycle issues, which I find very interesting and challenging.

What have I been tinkering with?


Motivation and reasoning

Platform agnostic. The specifications developed will not be platform specific such that they can be implemented on a variety of architectures and operating systems.

 -- CNCF Values

Why is the multi-platform functionality important for Kubernetes long-term?

$ kubectl motivate multiplatform

1. We don't know which platform will be the dominating one in 20 years from now

2. By letting new architectures join the project, and more people with them, we'll see a stronger ecosystem and a sound competition.

3. The risk of vendor lock-in on the default platform is significantly reduced

What could Kubernetes on ARM be used for right now?

KubeCloud: A Small-Scale Tangible Cloud Computing Environment

 - A master's thesis about educating Kubernetes' concepts by letting the students use Kubernetes on small Raspberry Pi clusters.

Microsoft Pledges to Use ARM Server Chips, Threatening Intel's Dominance

- The world's first 10nm processor is an ARM processor, exciting times!

In classrooms -- learning others how Kubernetes works by using Raspberry Pi's is the ideal way of letting newcomers actually see what it's all about

Since kubeadm was announced, it has been super-easy to set up Kubernetes in an official way on ARM and now also on ppc64le and s390x

Example setup on an ARM machine:

$ curl -s | apt-key add -
$ cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb kubernetes-xenial main
$ apt-get update && apt-get install -y kubeadm

$ kubeadm init

$ kubectl apply -f

$ # DONE!

TL;DR; Kubernetes shouldn't have different install paths for different platforms, it should just work out-of-the-box

How can I set up Kubernetes on an other architecture?

Oh, wow, how does that work under the hood?

Quick intro on cross-compiling
and manifest lists

Kubernetes releases server binaries for all supported architectures (amd64, arm, arm64, ppc64le, s390x) and node binaries for all supported platforms (+windows/amd64)

All docker images in the core k8s repo are built and pushed for all architectures using a semi-standardized Makefile.

Debian packages are provided for all architectures as well, basically just downloads the binaries and makes debs of them

kubeadm is aware of which architecture it's running on on init and generates manifests for the right architecture.

How does it work under the hood?

Binaries and docker images released by Kubernetes are cross-compiled and cross-built for non-amd64 architectures.

$ # Cross-compile main.go to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=0 go build main.go
$ # Cross-compile main.go (which contains CGO code) to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=1 CC=arm-linux-gnueabihf go build main.go

Cross-compilation with Go is relatively easy. Cross-building is a little bit harder, one may have to use QEMU to emulate another arch:

$ # Cross-build an armhf image with a RUN command that is executed on an amd64 host
$ cat Dockerfile
FROM armhf/debian:jessie
COPY qemu-arm-static /usr/bin/
RUN apt-get install iptables nfs-common
COPY hyperkube /

$ # Register the binfmt_misc module in the kernel and download QEMU
$ docker run --rm --privileged multiarch/qemu-user-static:register --reset
$ curl -sSL | tar -xz

$ docker build -t .

A quick recap on cross-compiling and cross-building

I don't want to have the architecture in the image name!!

Me neither. Enter manifest lists.

Imagine this scenario...

$ go build my-cool-app.go
$ docker build -t luxas/my-cool-app-amd64:v1.0.0 .
$ docker push luxas/my-cool-app-amd64:v1.0.0

$ # ARM
$ GOARCH=arm go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm:v1.0.0 .
$ docker push luxas/my-cool-app-arm:v1.0.0

$ # ARM 64-bit
$ GOARCH=arm64 go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm64:v1.0.0 .
$ docker push luxas/my-cool-app-arm64:v1.0.0

Then you get excited and create a k8s cluster of amd64, arm and arm64 nodes

and try to run your application on that cluster. But what architecture should you use?

$ kubectl run --image luxas/my-cool-app-???:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80

This the hardest problem with a multi-platform cluster, if you hardcode the architecture here, it will fail on all other machines. Ideally I would like to do this:

$ kubectl run --image luxas/my-cool-app:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80

Fortunately, that's totally possible!

"Manifest list" is currently a Docker registry and client feature only, but I hope the general idea can propagate to other CRI implementations in the future.

The idea is very simple, you have one tag (e.g. luxas/my-cool-app:v1.0.0) that serves as a "redirector" to platform-specific images. The client will then download the right image digest based on what platform it's running on.

Ok, so now that I know what a manifest list is, how do I create it?

$ go build my-app.go
$ docker build -t luxas/my-app-amd64:v1.0.0 .
$ docker push luxas/my-app-amd64:v1.0.0

$ # ARM
$ GOARCH=arm go build my-app.go
$ docker build -t luxas/my-app-arm:v1.0.0 .
$ docker push luxas/my-app-arm:v1.0.0

$ # ARM 64-bit
$ GOARCH=arm64 go build my-app.go
$ docker build -t luxas/my-app-arm64:v1.0.0 .
$ docker push luxas/my-app-arm64:v1.0.0

$ wget
$ mv manifest-tool-linux-amd64 manifest-tool && chmod +x manifest-tool
$ export PLATFORMS=linux/amd64,linux/arm,linux/arm64
$ ./manifest-tool push from-args \
    --platforms $PLATFORMS \ # Which platforms the manifest list include
    --template luxas/my-app-ARCH:v1.0.0 \ # ARCH is a placeholder for the real architecture
    --target luxas/my-app:v1.0.0 # The name of the resulting manifest list


 - The first release I participated in, I made the release bundle include ARM 32-bit binaries


 - Server docker images are released for ARM, both 32 and 64-bit

 - kubelet chooses the right pause image and registers itself with{os,arch}


 - kubeadm released as an official deployment method that supports ARM 32 and 64-bit

 - Unfortunately, I had to use a patched Golang version for building ARM 32-bit binaries...


 - The patched Golang version for ARM could be removed.

 - I reenabled ppc64le builds and the community contributed s390x builds.

How has the Kubernetes road to multiarch been?


Set up a cluster consisting of

2x Up Board

2x Odroid C2

3x Raspberry Pi 3

With kubeadm this gets easy

KUBE_HYPERKUBE_IMAGE=luxas/hyperkube:v1.6.0-kubeadm-workshop-2 kubeadm-new init --config kubeadm.yaml

sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf

kubectl apply -f weave.yaml

kubectl taint no pi5

kubectl taint no pi6 pi7

# Create the Dashboard Deployment and Service
kubectl apply -f demos/dashboard/dashboard.yaml

# Create the Heapster Deployment and Service
kubectl apply -f demos/monitoring/heapster.yaml

# Deploy Traefik as the Ingress Controller and use Ngrok to 
# expose the Traefik Service to the Internet
kubectl apply -f demos/loadbalancing/traefik-common.yaml
kubectl apply -f demos/loadbalancing/traefik-ngrok.yaml

# Expose the Dashboard to the world
kubectl apply -f demos/dashboard/ingress.yaml

# Get the public ngrok URL
curl -sSL $(kubectl -n kube-system get svc ngrok -o template --template \
    "{{.spec.clusterIP}}")/api/tunnels | jq  ".tunnels[].public_url" | sed 's/"//g;/http:/d'

# Create InfluxDB and Grafana for the saving the Heapster data
kubectl apply -f demos/monitoring/influx-grafana.yaml

# Create the Prometheus Operator, a Prometheus instance and a sample metrics app
kubectl apply -f demos/monitoring/prometheus-operator.yaml
kubectl apply -f demos/monitoring/sample-prometheus-instance.yaml

# Create a Custom Metrics API server
kubectl apply -f demos/monitoring/custom-metrics.yaml
$ kubectl get no -owide
NAME       STATUS     AGE       VERSION                                    EXTERNAL-IP   OS-IMAGE                        KERNEL-VERSION
pi5        Ready      42m       v1.6.0-beta.4                              <none>        Debian GNU/Linux 8 (jessie)     4.9.13-bee42-v8
pi6        Ready      43m       v1.6.0-beta.4                              <none>        Raspbian GNU/Linux 8 (jessie)   4.4.50-hypriotos-v7+
pi7        NotReady   43m       v1.6.0-beta.4                              <none>        Raspbian GNU/Linux 8 (jessie)   4.4.50-hypriotos-v7+
upboard1   Ready      46m       v1.7.0-alpha.0.1446+33eb8794c93d5b-dirty   <none>        Ubuntu 16.04.2 LTS              4.4.0-67-generic
upboard2   NotReady   43m       v1.6.0-beta.4                              <none>        Ubuntu 16.04.2 LTS              4.4.0-66-generic

$ kubectl get po --all-namespaces -owide
NAMESPACE        NAME                                                   READY     STATUS    RESTARTS   AGE       IP                NODE
custom-metrics   custom-metrics-apiserver-2410399496-j6dg1              1/1       Running   0          22m         pi6
default          prometheus-operator-1505754769-n7kj8                   1/1       Running   0          30m         upboard2
default          prometheus-sample-metrics-prom-0                       2/2       Running   0          29m         upboard2
default          sample-metrics-app-2440858958-1h5wf                    1/1       Running   0          1m         pi5
default          sample-metrics-app-2440858958-35fdz                    1/1       Running   0          1m        upboard2
default          sample-metrics-app-2440858958-56r2x                    1/1       Running   0          1m         upboard2
default          sample-metrics-app-2440858958-9grc1                    1/1       Running   0          29m         pi5
default          sample-metrics-app-2440858958-f5w1t                    1/1       Running   0          4m         pi5
default          sample-metrics-app-2440858958-km3gq                    1/1       Running   0          12m         pi6
default          sample-metrics-app-2440858958-lntqp                    1/1       Running   0          1m         pi6
default          sample-metrics-app-2440858958-nst8h                    1/1       Running   0          4m         pi6
kube-system      etcd-upboard1                                          1/1       Running   0          44m   upboard1
kube-system      heapster-57121549-mtx6f                                1/1       Running   0          41m         upboard2
kube-system      kube-dns-3913472980-l3rkl                              3/3       Running   0          44m         upboard1
kube-system      kube-proxy-0jwxh                                       1/1       Running   0          42m   pi5
kube-system      kube-proxy-7ks9n                                       1/1       Running   0          45m   upboard1
kube-system      kube-proxy-ktxqd                                       1/1       Running   0          43m   upboard2
kube-system      kube-proxy-snp6v                                       1/1       Running   0          43m   pi6
kube-system      kubernetes-dashboard-2731141917-rdbj2                  1/1       Running   0          41m         upboard2
kube-system      monitoring-grafana-4071825559-rbs3w                    1/1       Running   0          34m         pi5
kube-system      monitoring-influxdb-1373127269-pzwhx                   1/1       Running   0          34m         pi5
kube-system      ngrok-3984100120-f5900                                 1/1       Running   0          41m         upboard2
kube-system      pv-controller-manager-3769581161-dcn66                 1/1       Running   0          40m         pi6
kube-system      self-hosted-kube-apiserver-kk6hk                       1/1       Running   1          45m   upboard1
kube-system      self-hosted-kube-controller-manager-1546170996-40n6g   1/1       Running   0          45m   upboard1
kube-system      self-hosted-kube-scheduler-3991062876-6s94c            1/1       Running   1          45m   upboard1
kube-system      traefik-ingress-controller-3665677306-f5dhj            1/1       Running   0          41m         pi5
kube-system      weave-net-3h3xm                                        2/2       Running   0          42m   pi5
kube-system      weave-net-f7wwj                                        2/2       Running   0          43m   upboard2
kube-system      weave-net-kxcr2                                        2/2       Running   0          43m   pi6
kube-system      weave-net-n3tvh                                        2/2       Running   0          44m   upboard1
wardle           wardle-apiserver-3982025089-3grzx                      2/2       Running   0          32m        upboard2
$ kubectl get svc --all-namespaces
NAMESPACE        NAME                         CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
custom-metrics   api                   <none>        443/TCP          35m
default          kubernetes                 <none>        443/TCP          52m
default          prometheus-operated          None             <none>        9090/TCP         35m
default          sample-metrics-app     <none>        8080/TCP         35m
default          sample-metrics-prom    <nodes>       9090:30999/TCP   35m
kube-system      heapster              <none>        80/TCP           48m
kube-system      kube-dns                  <none>        53/UDP,53/TCP    52m
kube-system      kubernetes-dashboard    <none>        80/TCP           48m
kube-system      monitoring-grafana    <none>        80/TCP           44m
kube-system      monitoring-influxdb    <none>        8086/TCP         44m
kube-system      ngrok                 <none>        80/TCP           48m
kube-system      traefik-ingress-controller     <none>        80/TCP           48m
kube-system      traefik-web            <none>        80/TCP           48m
wardle           api                      <none>        443/TCP          39m

$ kubectl top node
NAME       CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
pi5        352m         8%        487Mi           56%       
upboard1   428m         10%       984Mi           75%       
pi6        414m         10%       449Mi           58%       
$ kubectl api-versions

$ kubectl apply -f demos/sample-apiserver/my-flunder.yaml 
flunder "my-first-flunder" configured

$ kubectl get flunders
NAME               KIND
$ curl -sSLk\
  "kind": "MetricValueList",
  "apiVersion": "",
  "metadata": {},
  "items": [
      "describedObject": {
        "kind": "Service",
        "namespace": "default",
        "name": "sample-metrics-app",
        "apiVersion": "/__internal"
      "metricName": "http_requests_total",
      "timestamp": "2017-03-24T13:14:13Z",
      "window": 60,
      "value": "299m"

$ kubectl get hpa
NAME                     REFERENCE                       TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
sample-metrics-app-hpa   Deployment/sample-metrics-app   333m / 100   2         10        10         31m

What now?

Roadmap and help-wanted issues

The current situation is ok and works, but it could obviously be improved. Here are some shout-outs to the community:

- Automated CI testing for the other architectures using kubeadm

    - We might be able to use the CNCF cluster here?

- Formalize a standard specification for how Kubernetes binaries should be compiled and how server images should be built

    - Official Kubernetes projects should publish binaries for at least             amd64, arm, arm64, ppc64le, s390x and windows (node only)

- Manifest lists should be built for the server images

    - This is blocked on not supporting v2 schema 2 :(

- Implement this feature in other CRI-compliant implementations

- Creating an external Admission Controller that applies platform data 

What's yet to be done here?

Thank you for listening!


Bochum Presentation

By lucask

Bochum Presentation

  • 189
Loading comments...

More from lucask