Florian Dambrine
I am a Freelance DevOps Engineer graduated from UTC (University of Technology of Compiègne) in 2014. I am a DevOps enthusiast embracing Cloud computing technologies to build automated infrastructure at large scale.
Knowledge worth sharing
#03
Florian Dambrine - Principal Engineer - @GumGum
***
***
***
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services
ECS is a fully managed container orchestration service
cluster
service / deployment
pods / tasks
volumes / autoscaling
Cluster | Cluster |
Service & Task definition | Deployment |
Task | Pod |
Volume | PersistentVolume |
Service
Tasks
Task definition
### Requirements:
### * Install aws-cli
aws ecs list-clusters
{
"clusterArns": [
"arn:aws:ecs:us-east-1:123456789910:cluster/va-mle--prod",
"arn:aws:ecs:us-east-1:123456789910:cluster/va-mle-inference--prod",
...
]
}
### Requirements:
### * Install Kubectl EKS Vendored (mac version)
### * Install aws-cli >= 1.16.156 (replacement of aws-iam-authenticator)
### Update `~/.kube/config` with EKS cluster config and alias it as `k8s-mle`
aws eks update-kubeconfig --name va-verity-prod-eks --alias verity-prod
### Set your client to connect to verity-prod in namespace monitoring
kubectl config set-context verity-prod --namespace=monitoring \
&& kubectl config
### List nodes running in the cluster
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-201-116-76.ec2.internal Ready <none> 16h v1.17.9-eks-4c6976
ip-10-201-24-166.ec2.internal Ready <none> 22m v1.17.9-eks-4c6976
ip-10-201-41-15.ec2.internal Ready <none> 13h v1.17.9-eks-4c6976
### List services running in the cluster va-mle--prod
aws ecs list-clusters
{
"clusterArns": [
"arn:aws:ecs:us-east-1:123456789910:cluster/va-mle--prod",
"arn:aws:ecs:us-east-1:123456789910:cluster/va-mle-inference--prod",
...
]
}
### Set your client to connect to verity-prod in namespace default
kubectl config set-context verity-prod --namespace=default \
&& kubectl config use-context verity-prod
/// OR ///
## Switch to cluster verity-prod namespace default
kubectx verity-prod
kubens default
### List pods running in the cluster namespace
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-201-116-76.ec2.internal Ready <none> 16h v1.17.9-eks-4c6976
ip-10-201-24-166.ec2.internal Ready <none> 22m v1.17.9-eks-4c6976
ip-10-201-41-15.ec2.internal Ready <none> 13h v1.17.9-eks-4c6976
### List services running in the cluster va-mle--prod
aws ecs list-services --cluster va-mle--prod
{
"serviceArns": [
"arn:aws:ecs:us-east-1:12345678910:service/video-transcribe__prod",
...
]
}
## List all running deployments in the cluster
kubectl get deployments --all-namespaces
NAME READY UP-TO-DATE AVAILABLE AGE
ai-kafka-lag-reporting--production-burrow 1/1 1 1 23d
ai-kafka-lag-reporting--production-karrot 1/1 1 1 23d
ai-kafka-lag-reporting--staging-burrow 1/1 1 1 16d
ai-kafka-lag-reporting--staging-karrot 1/1 1 1 16d
kafka-manager--production 1/1 1 1 23d
...
### List services running in the cluster va-mle--prod
aws ecs list-tasks --cluster va-mle--prod
{
"taskArns": [
"arn:aws:ecs:us-east-1:12345678910:task/va-mle--prod/08e6004b550c4adc8d760e6ef8618482",
...
]
}
## List all running pods in the cluster
kubectl get pods --all-namespaces
NAME READY STATUS RESTARTS AGE
ai-kafka-lag-reporting--production-burrow-695d887765-6g2mz 1/1 Running 0 59m
ai-kafka-lag-reporting--production-karrot-769748dcd9-wl7bc 1/1 Running 0 59m
ai-kafka-lag-reporting--staging-burrow-75f69ff888-k5hcq 1/1 Running 0 59m
ai-kafka-lag-reporting--staging-karrot-6b9d66d985-hhr7w 1/1 Running 0 59m
kafka-manager--production-6dcb578fc-bnfpb 1/1 Running 0 59m
prometheus-mle--production-alertmanager-6f57c4684b-dw9xs 2/2 Running 0 14h
prometheus-mle--production-kube-state-metrics-6c88687bd8-jkmfc 1/1 Running 0 59m
prometheus-mle--production-pushgateway-5b9dbd7f94-s9rb2 1/1 Running 0 59m
prometheus-mle--production-server-688cb4bf47-bdrw2 2/2 Running 0 59m
prometheus-nlp--production-alertmanager-cb88575cb-58xxf 2/2 Running 0 59m
prometheus-nlp--production-kube-state-metrics-5fb889bbc8-qnkzp 1/1 Running 0 59m
prometheus-nlp--production-pushgateway-64f76bfc48-rv2qb 1/1 Running 0 59m
prometheus-nlp--production-server-7565ffb9b9-rk6bn 2/2 Running 0 16h
prometheus-verity--production-alertmanager-6794ddd944-8chvf 2/2 Running 0 59m
### Inspect task specs from cluster va-mle--prod
aws ecs describe-tasks --cluster va-mle--prod --task-arn arn:aws:ecs:us-east-1:12345678910:task/va-mle--prod/08e6004b550c4adc8d760e6ef8618482
{
"tasks": [
{
"attachments": [],
"availabilityZone": "us-east-1e",
"clusterArn": "arn:aws:ecs:us-east-1:12345678910:cluster/va-mle--prod",
"connectivity": "CONNECTED",
"connectivityAt": 1603390576.188,
"containerInstanceArn": "arn:aws:ecs:us-east-1:12345678910:container-instance/va-mle--prod/8e5d58e75ce5423a91550ff7731ad373",
"containers": [
{
"containerArn": "arn:aws:ecs:us-east-1:12345678910:container/d72d8b0d-fd0f-43ab-8cbb-394a092f9cc1",
"taskArn": "arn:aws:ecs:us-east-1:12345678910:task/va-mle--prod/08e6004b550c4adc8d760e6ef8618482",
"name": "prism-api",
"image": "12345678910.dkr.ecr.us-east-1.amazonaws.com/gumgum/machine-learning-engineering/prism-api:0.9.3",
"imageDigest": "sha256:ad8d6a83347f83c862cfcae638bafa4cf0cd6fd53ae646b5a6c0be875ec2b3dd",
"runtimeId": "88ba0668d15aac5186f3a88ba68f802f952aa4641192661589fdba8b5e972a0b",
"lastStatus": "RUNNING",
"networkBindings": [
{
"bindIP": "0.0.0.0",
"containerPort": 8080,
"hostPort": 32774,
"protocol": "tcp"
},
{
"bindIP": "0.0.0.0",
"containerPort": 9090,
"hostPort": 32773,
"protocol": "tcp"
}
],
"networkInterfaces": [],
"healthStatus": "UNKNOWN",
"cpu": "1536",
"memory": "3584"
}
],
"cpu": "1536",
"createdAt": 1603390576.188,
"desiredStatus": "RUNNING",
"group": "service:prism-api__advertising-prod",
"healthStatus": "UNKNOWN",
"lastStatus": "RUNNING",
"launchType": "EC2",
"memory": "3584",
"overrides": {
"containerOverrides": [
{
"name": "prism-api"
}
],
"inferenceAcceleratorOverrides": []
},
"pullStartedAt": 1603390576.564,
"pullStoppedAt": 1603390577.564,
"startedAt": 1603390577.564,
"startedBy": "ecs-svc/3925212241363435971",
"tags": [],
"taskArn": "arn:aws:ecs:us-east-1:12345678910:task/va-mle--prod/08e6004b550c4adc8d760e6ef8618482",
"taskDefinitionArn": "arn:aws:ecs:us-east-1:12345678910:task-definition/prism-api__advertising-prod:11",
"version": 2
}
],
"failures": []
}
## Inspect pod spec
kubectl describe pod ai-kafka-lag-reporting--production-burrow-695d887765-6g2mz
Name: ai-kafka-lag-reporting--production-burrow-695d887765-6g2mz
Namespace: monitoring
Priority: 0
Node: ip-10-201-41-15.ec2.internal/10.201.41.15
Start Time: Fri, 23 Oct 2020 06:18:23 -0700
Labels: app.kubernetes.io/instance=ai-kafka-lag-reporting--production
app.kubernetes.io/name=burrow
pod-template-hash=695d887765
Annotations: checksum/config: 76685e6b14d969203808ad9b123ce98882855ce11558a90291622c008cbb65f1
checksum/templates: 2f6e14d9dc3289a56da3ee67666dff22aa865cbac57762a4efde12fd200a4437
kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.201.38.236
IPs:
IP: 10.201.38.236
Controlled By: ReplicaSet/ai-kafka-lag-reporting--production-burrow-695d887765
Containers:
burrow:
Container ID: docker://e4107e28b35f093f3f48ab2f80f0895971d325507cd499c0dba7dc636a3ba1f4
Image: ifoodhub/burrow:1.3.3
Image ID: docker-pullable://ifoodhub/burrow@sha256:ed9b8629983eddf496fc953fbe053db78370db594bd3cf1541c38c03c8b7b5b1
Port: 8000/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 23 Oct 2020 06:18:27 -0700
Ready: True
Restart Count: 0
Liveness: http-get http://:http/burrow/admin delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/burrow/admin delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/burrow from config (rw)
/etc/burrow/templates from templates (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-btxjt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ai-kafka-lag-reporting--production-burrow
Optional: false
templates:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: ai-kafka-lag-reporting--production-burrow-templates
Optional: false
default-token-btxjt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-btxjt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
### Inspect logs of a container from cluster va-mle--prod
# ¯\_(ツ)_/¯
ssh ubuntu@<instance>
docker ps
docker logs -f <container-id>
{"level":"warn","ts":1603459219.4788227,"msg":"unknown consumer","type":"module","coordinator":"storage","class":"inmemory","name":"default","worker":2,"cluster":"va-verity-kafka","consumer":"KMOffsetCache-kafka-manager--production-6dcb578fc-bnfpb","topic":"","partition":0,"topic_partition_count":0,"offset":0,"timestamp":0,"owner":"","client_id":"","request":"StorageFetchConsumer"}
{"level":"info","ts":1603459219.478872,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"va-verity-kafka","consumer":"KMOffsetCache-kafka-manager--production-6dcb578fc-bnfpb","showall":false}
## Check pod logs
# ❤
kubectl logs -f ai-kafka-lag-reporting--production-burrow-695d887765-6g2mz
{"level":"warn","ts":1603459219.4788227,"msg":"unknown consumer","type":"module","coordinator":"storage","class":"inmemory","name":"default","worker":2,"cluster":"va-verity-kafka","consumer":"KMOffsetCache-kafka-manager--production-6dcb578fc-bnfpb","topic":"","partition":0,"topic_partition_count":0,"offset":0,"timestamp":0,"owner":"","client_id":"","request":"StorageFetchConsumer"}
{"level":"info","ts":1603459219.478872,"msg":"cluster or consumer not found","type":"module","coordinator":"evaluator","class":"caching","name":"default","cluster":"va-verity-kafka","consumer":"KMOffsetCache-kafka-manager--production-6dcb578fc-bnfpb","showall":false}
~ Training / Development ~
HelmFILE is a wrapper on top of helm
Helmfile is what Terragrunt is to Terraform
Helm is a package manager for Kubernetes
Helm is your new ecs-cli
Kubernetes UIs - Blog Post 04/05/2020 by William Lam
Helm is a package manager for Kubernetes
Helm is your new ecs-cli...
### Requirements:
### * Install helm > v3.0
helm create ops-talks
Creating ops-talks
# Tree of the created chart
ops-talks
├── Chart.yaml
├── charts
├── templates
│ ├── NOTES.txt
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── hpa.yaml
│ ├── ingress.yaml
│ ├── service.yaml
│ ├── serviceaccount.yaml
│ └── tests
│ └── test-connection.yaml
└── values.yaml
3 directories, 10 files
$ helm install confluentinc/cp-helm-charts
$ helm install jenkins/jenkins
https://github.com/jenkinsci/helm-charts
https://github.com/confluentinc/cp-helm
https://github.com/helm/charts/
https://github.com/Lowess/helm-charts
$ helm install stable/atlantis
$ helm install stable/atlantis
stable/atlantis
$ helm install stable/atlantis
values.yaml
stable/atlantis
$ helm install stable/atlantis
values.yaml (default)
# Replace this with your own repo whitelist:
orgWhitelist: bitbucket.org/gumgum/*
logLevel: "debug"
myvalues.yaml (overrides)
myvalues.yaml
merge
HELMFILE IS A WRAPPER ON TOP OF HELM
Helmfile is what Terragrunt is to Terraform...
Give each Helm chart its own helmfile.yaml and include them recursively in a centralized helmfile.yaml.
Separate out environment specific values from general values. Often you’ll find while a Helm chart can take 50 different values, only a few actually differ between your environments.
As well as providing a set of values, either Environment specific or otherwise, you can also read Environment Variables, Execute scripts and read their output (Fetch a secret from AWS SSM)
Store remote state in git/s3/fileshare/etc in much the same way as Terraform does.
bases:
- ../../environments.yaml
---
repositories:
# Use Lowess (Florian Dambrine) OSS helm chart repo
- name: lowess-helm
url: https://lowess.github.io/helm-charts
templates:
default: &default
chart: "lowess-helm/karrot"
missingFileHandler: Error
namespace: "monitoring"
labels: {}
version: "0.1.3"
wait: true
installed: {{ and (env "KAFKA_LAG_REPORTING_INSTALLED" | default "true") }}
releases:
- name: "ai-kafka-lag-reporting--{{ .Environment.Name }}"
<<: *default
values:
- "./values/{{`{{ .Release.Name }}`}}.yaml"
helmfile.yaml
environments.yaml
releases
├── kafka-lag-reporting
│ ├── README.md
│ ├── helmfile.yaml
│ └── values
│ ├── ai-kafka-lag-reporting--production.yaml
│ ├── ai-kafka-lag-reporting--staging.yaml
│ └── kafka-lag-reporting--production.yaml
├── kafka-manager
│ ├── helmfile.yaml
│ └── values
│ └── kafka-manager--production.yaml
├── prometheus
│ ├── helmfile.yaml
│ └── values
│ ├── prometheus-mle--production.yaml
│ ├── prometheus-nlp--production.yaml
│ └── prometheus-verity--production.yaml
└── zoonavigator
├── helmfile.yaml
└── values
└── verity-zoonavigator--production.yaml
By Florian
By Florian Dambrine
OpsTalks #03 - K8s & ECS
I am a Freelance DevOps Engineer graduated from UTC (University of Technology of Compiègne) in 2014. I am a DevOps enthusiast embracing Cloud computing technologies to build automated infrastructure at large scale.