Intel and ARM, let Kubernetes rule them all!
A Swedish-speaking second-year Upper Secondary School (High School) Student from Finland
A person that has never attended a computing class :)
A maintainer of Kubernetes since a year back
The "kubernetes-on-arm" guy
$ whoami
I worked on and maintained minikube in the early days of the project, until I...
However, I wasn't satisfied with a side-project, I wanted it in core, so I implemented multiarch support for Kubernetes in the Spring 2016. I also wrote a multi-platform proposal
My first open source project was kubernetes-on-arm. It was the first easy solution to run Kubernetes on Raspberry Pi's
...moved on to kubeadm in August 2016, and started focusing on SIG-Cluster-Lifecycle issues, which I find very interesting and challenging.
What have I been tinkering with?
Why?
Motivation and reasoning
Platform agnostic. The specifications developed will not be platform specific such that they can be implemented on a variety of architectures and operating systems.
-- CNCF Values
Why is the multi-platform functionality important for Kubernetes long-term?
$ kubectl motivate multiplatform
1. We don't know which platform will be the dominating one in 20 years from now
2. By letting new architectures join the project, and more people with them, we'll see a stronger ecosystem and a sound competition.
3. The risk of vendor lock-in on the default platform is significantly reduced
What could Kubernetes on ARM be used for right now?
KubeCloud: A Small-Scale Tangible Cloud Computing Environment
- A master's thesis about educating Kubernetes' concepts by letting the students use Kubernetes on small Raspberry Pi clusters.
Microsoft Pledges to Use ARM Server Chips, Threatening Intel's Dominance
- The world's first 10nm processor is an ARM processor, exciting times!
In classrooms -- learning others how Kubernetes works by using Raspberry Pi's is the ideal way of letting newcomers actually see what it's all about
Since kubeadm was announced, it has been super-easy to set up Kubernetes in an official way on ARM and now also on ppc64le and s390x
Example setup on an ARM machine:
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ apt-get update && apt-get install -y docker.io kubeadm
$ kubeadm init
...
$ kubectl apply -f https://git.io/weave-kube-1.6
$ # DONE!
TL;DR; Kubernetes shouldn't have different install paths for different platforms, it should just work out-of-the-box
How can I set up Kubernetes on an other architecture?
Oh, wow, how does that work under the hood?
Quick intro on cross-compiling
and manifest lists
Kubernetes releases server binaries for all supported architectures (amd64, arm, arm64, ppc64le, s390x) and node binaries for all supported platforms (+windows/amd64)
All docker images in the core k8s repo are built and pushed for all architectures using a semi-standardized Makefile.
Debian packages are provided for all architectures as well, basically just downloads the binaries and makes debs of them
kubeadm is aware of which architecture it's running on on init and generates manifests for the right architecture.
How does it work under the hood?
Binaries and docker images released by Kubernetes are cross-compiled and cross-built for non-amd64 architectures.
$ # Cross-compile main.go to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=0 go build main.go
$ # Cross-compile main.go (which contains CGO code) to ARM 32-bit
$ GOOS=linux GOARCH=arm CGO_ENABLED=1 CC=arm-linux-gnueabihf go build main.go
Cross-compilation with Go is relatively easy. Cross-building is a little bit harder, one may have to use QEMU to emulate another arch:
$ # Cross-build an armhf image with a RUN command that is executed on an amd64 host
$ cat Dockerfile
FROM armhf/debian:jessie
COPY qemu-arm-static /usr/bin/
RUN apt-get install iptables nfs-common
COPY hyperkube /
$ # Register the binfmt_misc module in the kernel and download QEMU
$ docker run --rm --privileged multiarch/qemu-user-static:register --reset
$ curl -sSL https://foo-qemu-download.com/x86_64_qemu-arm-static.tar.gz | tar -xz
$ docker build -t gcr.io/google_containers/hyperkube-arm:v1.x.y .
A quick recap on cross-compiling and cross-building
I don't want to have the architecture in the image name!!
Me neither. Enter manifest lists.
Imagine this scenario...
$ go build my-cool-app.go
$ docker build -t luxas/my-cool-app-amd64:v1.0.0 .
...
$ docker push luxas/my-cool-app-amd64:v1.0.0
$ # ARM
$ GOARCH=arm go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm:v1.0.0 .
...
$ docker push luxas/my-cool-app-arm:v1.0.0
$ # ARM 64-bit
$ GOARCH=arm64 go build my-cool-app.go
$ docker build -t luxas/my-cool-app-arm64:v1.0.0 .
...
$ docker push luxas/my-cool-app-arm64:v1.0.0
Then you get excited and create a k8s cluster of amd64, arm and arm64 nodes
and try to run your application on that cluster. But what architecture should you use?
$ kubectl run --image luxas/my-cool-app-???:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80
This the hardest problem with a multi-platform cluster, if you hardcode the architecture here, it will fail on all other machines. Ideally I would like to do this:
$ kubectl run --image luxas/my-cool-app:v1.0.0 my-cool-app --port 80
$ kubectl expose deployment my-cool-app --port 80
Fortunately, that's totally possible!
"Manifest list" is currently a Docker registry and client feature only, but I hope the general idea can propagate to other CRI implementations in the future.
The idea is very simple, you have one tag (e.g. luxas/my-cool-app:v1.0.0) that serves as a "redirector" to platform-specific images. The client will then download the right image digest based on what platform it's running on.
Ok, so now that I know what a manifest list is, how do I create it?
$ go build my-app.go
$ docker build -t luxas/my-app-amd64:v1.0.0 .
...
$ docker push luxas/my-app-amd64:v1.0.0
$ # ARM
$ GOARCH=arm go build my-app.go
$ docker build -t luxas/my-app-arm:v1.0.0 .
...
$ docker push luxas/my-app-arm:v1.0.0
$ # ARM 64-bit
$ GOARCH=arm64 go build my-app.go
$ docker build -t luxas/my-app-arm64:v1.0.0 .
...
$ docker push luxas/my-app-arm64:v1.0.0
$ wget https://github.com/estesp/manifest-tool/releases/download/v0.4.0/manifest-tool-linux-amd64
$ mv manifest-tool-linux-amd64 manifest-tool && chmod +x manifest-tool
$ export PLATFORMS=linux/amd64,linux/arm,linux/arm64
$ ./manifest-tool push from-args \
--platforms $PLATFORMS \ # Which platforms the manifest list include
--template luxas/my-app-ARCH:v1.0.0 \ # ARCH is a placeholder for the real architecture
--target luxas/my-app:v1.0.0 # The name of the resulting manifest list
v1.2:
- The first release I participated in, I made the release bundle include ARM 32-bit binaries
v1.3:
- Server docker images are released for ARM, both 32 and 64-bit
- kubelet chooses the right pause image and registers itself with beta.kubernetes.io/{os,arch}
v1.4:
- kubeadm released as an official deployment method that supports ARM 32 and 64-bit
- Unfortunately, I had to use a patched Golang version for building ARM 32-bit binaries...
v1.6:
- The patched Golang version for ARM could be removed.
- I reenabled ppc64le builds and the community contributed s390x builds.
How has the Kubernetes road to multiarch been?
Demo!
Set up a cluster consisting of
2x Up Board
2x Odroid C2
3x Raspberry Pi 3
With kubeadm this gets easy
KUBE_HYPERKUBE_IMAGE=luxas/hyperkube:v1.6.0-kubeadm-workshop-2 kubeadm-new init --config kubeadm.yaml
sudo cp /etc/kubernetes/admin.conf $HOME/
sudo chown $(id -u):$(id -g) $HOME/admin.conf
export KUBECONFIG=$HOME/admin.conf
kubectl apply -f weave.yaml
kubectl taint no pi5 beta.kubernetes.io/arch=arm64:NoSchedule
kubectl taint no pi6 pi7 beta.kubernetes.io/arch=arm:NoSchedule
# Create the Dashboard Deployment and Service
kubectl apply -f demos/dashboard/dashboard.yaml
# Create the Heapster Deployment and Service
kubectl apply -f demos/monitoring/heapster.yaml
# Deploy Traefik as the Ingress Controller and use Ngrok to
# expose the Traefik Service to the Internet
kubectl apply -f demos/loadbalancing/traefik-common.yaml
kubectl apply -f demos/loadbalancing/traefik-ngrok.yaml
# Expose the Dashboard to the world
kubectl apply -f demos/dashboard/ingress.yaml
# Get the public ngrok URL
curl -sSL $(kubectl -n kube-system get svc ngrok -o template --template \
"{{.spec.clusterIP}}")/api/tunnels | jq ".tunnels[].public_url" | sed 's/"//g;/http:/d'
# Create InfluxDB and Grafana for the saving the Heapster data
kubectl apply -f demos/monitoring/influx-grafana.yaml
# Create the Prometheus Operator, a Prometheus instance and a sample metrics app
kubectl apply -f demos/monitoring/prometheus-operator.yaml
kubectl apply -f demos/monitoring/sample-prometheus-instance.yaml
# Create a Custom Metrics API server
kubectl apply -f demos/monitoring/custom-metrics.yaml
$ kubectl get no -owide
NAME STATUS AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION
pi5 Ready 42m v1.6.0-beta.4 <none> Debian GNU/Linux 8 (jessie) 4.9.13-bee42-v8
pi6 Ready 43m v1.6.0-beta.4 <none> Raspbian GNU/Linux 8 (jessie) 4.4.50-hypriotos-v7+
pi7 NotReady 43m v1.6.0-beta.4 <none> Raspbian GNU/Linux 8 (jessie) 4.4.50-hypriotos-v7+
upboard1 Ready 46m v1.7.0-alpha.0.1446+33eb8794c93d5b-dirty <none> Ubuntu 16.04.2 LTS 4.4.0-67-generic
upboard2 NotReady 43m v1.6.0-beta.4 <none> Ubuntu 16.04.2 LTS 4.4.0-66-generic
$ kubectl get po --all-namespaces -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
custom-metrics custom-metrics-apiserver-2410399496-j6dg1 1/1 Running 0 22m 10.47.0.2 pi6
default prometheus-operator-1505754769-n7kj8 1/1 Running 0 30m 10.44.0.4 upboard2
default prometheus-sample-metrics-prom-0 2/2 Running 0 29m 10.44.0.8 upboard2
default sample-metrics-app-2440858958-1h5wf 1/1 Running 0 1m 10.45.0.6 pi5
default sample-metrics-app-2440858958-35fdz 1/1 Running 0 1m 10.44.0.11 upboard2
default sample-metrics-app-2440858958-56r2x 1/1 Running 0 1m 10.44.0.9 upboard2
default sample-metrics-app-2440858958-9grc1 1/1 Running 0 29m 10.45.0.4 pi5
default sample-metrics-app-2440858958-f5w1t 1/1 Running 0 4m 10.45.0.5 pi5
default sample-metrics-app-2440858958-km3gq 1/1 Running 0 12m 10.47.0.3 pi6
default sample-metrics-app-2440858958-lntqp 1/1 Running 0 1m 10.47.0.5 pi6
default sample-metrics-app-2440858958-nst8h 1/1 Running 0 4m 10.47.0.4 pi6
kube-system etcd-upboard1 1/1 Running 0 44m 192.168.200.211 upboard1
kube-system heapster-57121549-mtx6f 1/1 Running 0 41m 10.44.0.2 upboard2
kube-system kube-dns-3913472980-l3rkl 3/3 Running 0 44m 10.32.0.2 upboard1
kube-system kube-proxy-0jwxh 1/1 Running 0 42m 192.168.200.215 pi5
kube-system kube-proxy-7ks9n 1/1 Running 0 45m 192.168.200.211 upboard1
kube-system kube-proxy-ktxqd 1/1 Running 0 43m 192.168.200.212 upboard2
kube-system kube-proxy-snp6v 1/1 Running 0 43m 192.168.200.216 pi6
kube-system kubernetes-dashboard-2731141917-rdbj2 1/1 Running 0 41m 10.44.0.1 upboard2
kube-system monitoring-grafana-4071825559-rbs3w 1/1 Running 0 34m 10.45.0.2 pi5
kube-system monitoring-influxdb-1373127269-pzwhx 1/1 Running 0 34m 10.45.0.3 pi5
kube-system ngrok-3984100120-f5900 1/1 Running 0 41m 10.44.0.3 upboard2
kube-system pv-controller-manager-3769581161-dcn66 1/1 Running 0 40m 10.47.0.1 pi6
kube-system self-hosted-kube-apiserver-kk6hk 1/1 Running 1 45m 192.168.200.211 upboard1
kube-system self-hosted-kube-controller-manager-1546170996-40n6g 1/1 Running 0 45m 192.168.200.211 upboard1
kube-system self-hosted-kube-scheduler-3991062876-6s94c 1/1 Running 1 45m 192.168.200.211 upboard1
kube-system traefik-ingress-controller-3665677306-f5dhj 1/1 Running 0 41m 10.45.0.1 pi5
kube-system weave-net-3h3xm 2/2 Running 0 42m 192.168.200.215 pi5
kube-system weave-net-f7wwj 2/2 Running 0 43m 192.168.200.212 upboard2
kube-system weave-net-kxcr2 2/2 Running 0 43m 192.168.200.216 pi6
kube-system weave-net-n3tvh 2/2 Running 0 44m 192.168.200.211 upboard1
wardle wardle-apiserver-3982025089-3grzx 2/2 Running 0 32m 10.44.0.10 upboard2
$ kubectl get svc --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
custom-metrics api 10.100.246.198 <none> 443/TCP 35m
default kubernetes 10.96.0.1 <none> 443/TCP 52m
default prometheus-operated None <none> 9090/TCP 35m
default sample-metrics-app 10.97.141.133 <none> 8080/TCP 35m
default sample-metrics-prom 10.105.118.16 <nodes> 9090:30999/TCP 35m
kube-system heapster 10.107.113.203 <none> 80/TCP 48m
kube-system kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 52m
kube-system kubernetes-dashboard 10.99.233.145 <none> 80/TCP 48m
kube-system monitoring-grafana 10.105.105.151 <none> 80/TCP 44m
kube-system monitoring-influxdb 10.99.193.162 <none> 8086/TCP 44m
kube-system ngrok 10.107.224.120 <none> 80/TCP 48m
kube-system traefik-ingress-controller 10.102.162.4 <none> 80/TCP 48m
kube-system traefik-web 10.109.90.245 <none> 80/TCP 48m
wardle api 10.99.51.75 <none> 443/TCP 39m
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
pi5 352m 8% 487Mi 56%
upboard1 428m 10% 984Mi 75%
pi6 414m 10% 449Mi 58%
$ kubectl api-versions
apiregistration.k8s.io/v1alpha1
apps/v1beta1
authentication.k8s.io/v1
authentication.k8s.io/v1beta1
authorization.k8s.io/v1
authorization.k8s.io/v1beta1
autoscaling/v1
autoscaling/v2alpha1
batch/v1
batch/v2alpha1
certificates.k8s.io/v1beta1
custom-metrics.metrics.k8s.io/v1alpha1
extensions/v1beta1
monitoring.coreos.com/v1alpha1
policy/v1beta1
rbac.authorization.k8s.io/v1alpha1
rbac.authorization.k8s.io/v1beta1
rook.io/v1beta1
settings.k8s.io/v1alpha1
storage.k8s.io/v1
storage.k8s.io/v1beta1
v1
wardle.k8s.io/v1alpha1
$ kubectl apply -f demos/sample-apiserver/my-flunder.yaml
flunder "my-first-flunder" configured
$ kubectl get flunders
NAME KIND
my-first-flunder Flunder.v1alpha1.wardle.k8s.io
$ curl -sSLk https://10.100.246.198/apis/custom-metrics.metrics.k8s.io/v1alpha1\
/namespaces/default/services/sample-metrics-app/http_requests_total
{
"kind": "MetricValueList",
"apiVersion": "custom-metrics.metrics.k8s.io/v1alpha1",
"metadata": {},
"items": [
{
"describedObject": {
"kind": "Service",
"namespace": "default",
"name": "sample-metrics-app",
"apiVersion": "/__internal"
},
"metricName": "http_requests_total",
"timestamp": "2017-03-24T13:14:13Z",
"window": 60,
"value": "299m"
}
]
}
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
sample-metrics-app-hpa Deployment/sample-metrics-app 333m / 100 2 10 10 31m
What now?
Roadmap and help-wanted issues
The current situation is ok and works, but it could obviously be improved. Here are some shout-outs to the community:
- Automated CI testing for the other architectures using kubeadm
- We might be able to use the CNCF cluster here?
- Formalize a standard specification for how Kubernetes binaries should be compiled and how server images should be built
- Official Kubernetes projects should publish binaries for at least amd64, arm, arm64, ppc64le, s390x and windows (node only)
- Manifest lists should be built for the server images
- This is blocked on gcr.io not supporting v2 schema 2 :(
- Implement this feature in other CRI-compliant implementations
- Creating an external Admission Controller that applies platform data
What's yet to be done here?
Thank you for listening!
github.com/luxas
twitter.com/kubernetesonarm
luxaslabs.com
Bochum Presentation
By lucask
Bochum Presentation
- 2,688