Dev Ops & SRE
pro Patek.cz
ale není to o tom ^^
Petr Michalec
Works as SRE at F5, before Volterra.io, Mirantis, IBM, ...
n(vi)m lover. developer. geek. quad fpv pilot.
all with the passion for the edge thing
On Twitter as @epcim.
vývoj a provoz moderních aplikací a infrastruktury
- Čím chcete být
- Co děláte když tvoříte
Buzzword
Co je teda DevOps?
GitOps
SecOps
NetOps
DevOps
DataOps
AIOps
5G
AI
hypperautomation
XR
Appka
-
dobrej nápad
-
dobre napsat
-
zabalit
-
distribuovat -
provozovat *
- v pc
- v mobilu
- v kontejneru
- v cloudu
- napříč cloudy, edge, iot
- ...
Appka
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9343301/pasted-from-clipboard.png)
Appka
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9343330/pasted-from-clipboard.png)
Appka
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9343296/pasted-from-clipboard.png)
Appka
Appka
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9344271/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9344272/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9343305/pasted-from-clipboard.png)
Appka
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346036/pasted-from-clipboard.png)
Trend
Trend
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346301/pasted-from-clipboard.png)
DevOps
- "Dev" je jako vývoj (ux, frontend, backend,...)
- "Ops" je jako zajištění provozu (IT)
"SRE" je vlastně pracovní role, taky definováno jako principy a praktiky software inženýrství aplikované na infrastrukturu a provoz
Co je teda DevOps?
pattern
Gee Kim, Co-Author of: “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win.”
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/2311075/devops1.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/2311076/devops2.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/2311077/devops3.png)
Co je teda DevOps?
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9342625/pasted-from-clipboard.png)
x
x
x
x
x
x
x
?
x
Text
x
x
x
?
...být schopen nasadit a provozovat aplikace i infrastrukturu jako kód
A cultural and professional movement, focused on how we build and operate high velocity organizations, born from the experiences of its practitioners.
Nebo taky
Jak
Proč
Stručně
![](https://media2.giphy.com/media/AUMxbiDIzBOiFGA71r/giphy.gif)
Aplikace
Logiku
Stručně
![](https://media2.giphy.com/media/AUMxbiDIzBOiFGA71r/giphy.gif)
![](https://media2.giphy.com/media/AUMxbiDIzBOiFGA71r/giphy.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346462/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346473/pasted-from-clipboard.png)
Aplikace
Kontejner
Orchestrace
-
Integration (because microservices)
-
Deployment (because infra. as code)
-
Delivery (because business needs)
-
Measure/Improve (to improve)
-
Validation (because security, audit)
Continuous je taky buzz
-
Jake nastroje pouzivame
-
Ukazka
-
Jak vypada sprava aplikaci distribuovanych v cloudu
Co dal?
Lepidla
Git
Nástroj pro distribuované verzování zdrojového kódu
Původně pro Linux kernel. (Jiné: Bzr, Hg, Svn).
https://git-scm.com/book/cs/v2
Git
git clone https://github.com/GoogleContainerTools/kpt-functions-catalog
git show
git diff
git pull -r
git checkout -b new-branch
git cherry-pick dbd4b43
git commit
git rebase origin/main
git rebase -i HEAD~5
Golang
Programovací jazyk od Google (2007).
High performance and fast development. Powerfull standard library.
Golang
package main
import "fmt"
func main() {
fmt.Println("hello world")
}
$ go build hello-world.go
$ ls
hello-world hello-world.go
$ ./hello-world
hello world
Golang
// GotplRenderBuf process templates to buffer
func (p *RenderPlugin) GotplRenderBuf(t string, out *bytes.Buffer) error {
// read template
tContent, err := ioutil.ReadFile(t)
if err != nil {
return fmt.Errorf("read template failed: %w", err)
}
// init
fMap := sprig.TxtFuncMap()
for k, v := range SprigCustomFuncs {
fMap[k] = v
}
tpl := template.Must(
template.New(t).Funcs(fMap).Parse(string(tContent)),
)
//render
err = tpl.Execute(out, p.Values)
if err != nil {
return err
}
return nil
}
- Python
- Javascript / TypeScript
Taky záleží na co...
Terrafom
Nástroj který skrze deklarativní jazyk (DSL), indepotent konfigurace a CLI rozhraní umožňuje přes API spravovat vzdálené ......
https://registry.terraform.io/browse/providers
Terrafom
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346202/pasted-from-clipboard.png)
Terraform
variable "aws_region" {
default = "eu-west-1"
}
variable "domain" {
default = "my_domain"
}
provider "aws" {
region = "${var.aws_region}"
}
# Note: The bucket name needs to carry the same name as the domain!
# http://stackoverflow.com/a/5048129/2966951
resource "aws_s3_bucket" "site" {
bucket = "${var.domain}"
acl = "public-read"
policy = <<EOF
{
"Version":"2008-10-17",
"Statement":[{
"Sid":"AllowPublicRead",
"Effect":"Allow",
"Principal": {"AWS": "*"},
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::${var.domain}/*"]
}]
}
EOF
website {
index_document = "index.html"
}
}
# Note: Creating this route53 zone is not enough. The domain's name servers need to point to the NS
# servers of the route53 zone. Otherwise the DNS lookup will fail.
# To verify that the dns lookup succeeds: `dig site @nameserver`
resource "aws_route53_zone" "main" {
name = "${var.domain}"
}
resource "aws_route53_record" "root_domain" {
zone_id = "${aws_route53_zone.main.zone_id}"
name = "${var.domain}"
type = "A"
alias {
name = "${aws_cloudfront_distribution.cdn.domain_name}"
zone_id = "${aws_cloudfront_distribution.cdn.hosted_zone_id}"
evaluate_target_health = false
}
}
resource "aws_cloudfront_distribution" "cdn" {
origin {
origin_id = "${var.domain}"
domain_name = "${var.domain}.s3.amazonaws.com"
}
# If using route53 aliases for DNS we need to declare it here too, otherwise we'll get 403s.
aliases = ["${var.domain}"]
enabled = true
default_root_object = "index.html"
default_cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "${var.domain}"
forwarded_values {
query_string = true
cookies {
forward = "none"
}
}
viewer_protocol_policy = "allow-all"
min_ttl = 0
default_ttl = 3600
max_ttl = 86400
}
# The cheapest priceclass
price_class = "PriceClass_100"
# This is required to be specified even if it's not used.
restrictions {
geo_restriction {
restriction_type = "none"
locations = []
}
}
viewer_certificate {
cloudfront_default_certificate = true
}
}
output "s3_website_endpoint" {
value = "${aws_s3_bucket.site.website_endpoint}"
}
output "route53_domain" {
value = "${aws_route53_record.root_domain.fqdn}"
}
output "cdn_domain" {
value = "${aws_cloudfront_distribution.cdn.domain_name}"
}
Ansible
Je taky nástroj s deklarativní konfigurací, CLI rozhraním a taky umožňuje spravovat vzdálené "servery, routery".
Tzv. config management.
https://www.guru99.com/ansible-tutorial.html
(Jiné: Salt, https://saltproject.io/ ).
Ansible
# playbook-base.yaml
- name: base
become: true
become_method: sudo
hosts: all
roles:
- sshd
- users
- system
- netplan
- network
#- hardening
#- cleanup
serial: "{{ batch_size|default(10) }}"
Ansible
- name: sshd configuration file
template:
src: sshd_config.metal.j2
dest: "{{ sshdconfig }}"
owner: 0
group: 0
mode: 0600
validate: "sshd -T -f %s"
backup: yes
vars:
sshd_allow_users: "{{ (admin_users + autom_users) | join(' ') }} "
notify:
- restart sshd
# when: ansible_virtualization_role != "guest" or ansible_virtualization_type != "docker"
Gitlab CI
GitLab je DevOps platforma! Hurá.
Server pro Git repositáře. Web UI. Správa issues. A taky hlavně CI
(Jiné: Github https://github.com, Gitea https://gitea.io ).
Gitlab CI
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346345/pasted-from-clipboard.png)
Gitlab CI
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346285/pasted-from-clipboard.png)
Argo CD
Deklarativní Continuous Delivery. Pro GitOps. Rozumíme...
"infrastruktura = kód"; Šroubovák co instaluje ten kód do Kubernetes.
https://argoproj.github.io/cd/
Argo CD
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346134/pasted-from-clipboard.png)
Argo CD
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346135/pasted-from-clipboard.png)
Argo CD
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346141/pasted-from-clipboard.png)
Prometheus
Původně "SoundCloud" (2012).
multi-dimensional data model, operational simplicity, scalable data collection, and a powerful query language
Prometheus
{
name: 'etcd-service',
rules: [
{
alert: 'EtcdDatabaseSpaceFilling',
expr: |||
100 - (etcd_mvcc_db_total_size_in_bytes / etcd_server_quota_backend_bytes) * 100 < 15
||| % $._config,
'for': '10m',
labels: {
severity: 'major',
identifier: '{{ $labels.instance }}',
group: 'Infrastructure',
service_name: 'etcd',
tenant: 'ves-sre',
},
annotations: {
display_name: 'Database Error',
description: 'Etcd database {{ $labels.instance }} is filling up. Only {{ $value }}% of space is available.',
},
docs:: {
name: |||
TODO
|||,
description: |||
Etcd database is filling up.
|||,
action: |||
This has to be solved in next business day working hours by L2.
|||,
},
},
Prometheus
# Aggregating up requests per second that has a path label:
- record: instance_path:requests:rate5m
expr: rate(requests_total{job="myjob"}[5m])
- record: path:requests:rate5m
expr: sum without (instance)(instance_path:requests:rate5m{job="myjob"})
Recording Rule Example 2
================================
# Calculating a request failure ratio and aggregating up to the job-level failure ratio:
- record: instance_path:request_failures:rate5m
expr: rate(request_failures_total{job="myjob"}[5m])
- record: instance_path:request_failures_per_requests:ratio_rate5m
expr: |2
instance_path:request_failures:rate5m{job="myjob"}
/
instance_path:requests:rate5m{job="myjob"}
# Aggregate up numerator and denominator, then divide to get path-level ratio.
- record: path:request_failures_per_requests:ratio_rate5m
expr: |2
sum without (instance)(instance_path:request_failures:rate5m{job="myjob"})
/
sum without (instance)(instance_path:requests:rate5m{job="myjob"})
Fluentbit
Treasure Data "fluentd --> fluentbit" (2014).
super fast, lightweight, and highly scalable logging and metrics processor
Fluentbit
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346819/pasted-from-clipboard.png)
Vector.dev
Vector.dev
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346826/pasted-from-clipboard.png)
Vector.dev
## Fluent source
{{- if has .vector_enable_fluent $enabled }}
[sources.in_fluent]
type = "fluent"
address = "0.0.0.0:24224"
connection_limit = 2000
keepalive.time_secs = 30
receive_buffer_bytes = 1048576
tls.enabled = true
tls.verify_certificate = true
tls.ca_file = "/corp/secrets/identity/client_ca_with_fluent.crt"
tls.crt_file = "/corp/secrets/identity/server.crt"
tls.key_file = "/corp/secrets/identity/server.key"
{{- else }}
[sources.in_fluent]
type = "file"
include = ["/dev/null"]
{{- end }}
[transforms.remap_fluent_throttle_key]
type = "remap"
inputs = ["in_fluent"]
source = '''
if match(.tag, r'^alert\..*$') ?? false {
._throttle_key, err = join([.labels.cluster_name, .labels.alertname], separator: "_")
if err != null {
log("Unable to construct throttle key for alert cluster_name=" + to_string!(.labels.cluster_name) + ", alertname="+ to_string!(.labels.alertname) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
abort
}
} else {
._throttle_key, err = join([.cluster_name, .tag], separator: "_")
if err != null {
log("Unable to construct throttle key for cluster_name=" + to_string!(.cluster_name) + ", tag="+ to_string!(.tag) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
abort
}
}
'''
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346084/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346874/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346146/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346150/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346180/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346181/pasted-from-clipboard.png)
Observability
![](https://s3.amazonaws.com/media-p.slid.es/uploads/81113/images/9346194/pasted-from-clipboard.png)
Recap
- Zapomeňte na "buzzwords"
- Develop code, Deploy with code, Emit metrics, Improve
- Čtěte dokumentaci ne StackOverflow a youtube
- Tvořte cokoliv
Look Forward
Dev Ops & SRE
By Petr Michalec
Dev Ops & SRE
- 897