Dev Ops & SRE
pro Patek.cz
ale není to o tom ^^
Petr Michalec
Works as SRE at F5, before Volterra.io, Mirantis, IBM, ...
n(vi)m lover. developer. geek. quad fpv pilot.
all with the passion for the edge thing
On Twitter as @epcim.
vývoj a provoz moderních aplikací a infrastruktury
- Čím chcete být
- Co děláte když tvoříte
Buzzword
Co je teda DevOps?
GitOps
SecOps
NetOps
DevOps
DataOps
AIOps
5G
AI
hypperautomation
XR
Appka
-
dobrej nápad
-
dobre napsat
-
zabalit
-
distribuovat -
provozovat *
- v pc
- v mobilu
- v kontejneru
- v cloudu
- napříč cloudy, edge, iot
- ...
Appka
Appka
Appka
Appka
Appka
Appka
Trend
Trend
DevOps
- "Dev" je jako vývoj (ux, frontend, backend,...)
- "Ops" je jako zajištění provozu (IT)
"SRE" je vlastně pracovní role, taky definováno jako principy a praktiky software inženýrství aplikované na infrastrukturu a provoz
Co je teda DevOps?
pattern
Gee Kim, Co-Author of: “The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win.”
Co je teda DevOps?
x
x
x
x
x
x
x
?
x
Text
x
x
x
?
...být schopen nasadit a provozovat aplikace i infrastrukturu jako kód
A cultural and professional movement, focused on how we build and operate high velocity organizations, born from the experiences of its practitioners.
Nebo taky
Jak
Proč
Stručně
Aplikace
Logiku
Stručně
Aplikace
Kontejner
Orchestrace
-
Integration (because microservices)
-
Deployment (because infra. as code)
-
Delivery (because business needs)
-
Measure/Improve (to improve)
-
Validation (because security, audit)
Continuous je taky buzz
-
Jake nastroje pouzivame
-
Ukazka
-
Jak vypada sprava aplikaci distribuovanych v cloudu
Co dal?
Lepidla
Git
Nástroj pro distribuované verzování zdrojového kódu
Původně pro Linux kernel. (Jiné: Bzr, Hg, Svn).
https://git-scm.com/book/cs/v2
Git
git clone https://github.com/GoogleContainerTools/kpt-functions-catalog
git show
git diff
git pull -r
git checkout -b new-branch
git cherry-pick dbd4b43
git commit
git rebase origin/main
git rebase -i HEAD~5
Golang
Programovací jazyk od Google (2007).
High performance and fast development. Powerfull standard library.
Golang
package main
import "fmt"
func main() {
fmt.Println("hello world")
}
$ go build hello-world.go
$ ls
hello-world hello-world.go
$ ./hello-world
hello world
Golang
// GotplRenderBuf process templates to buffer
func (p *RenderPlugin) GotplRenderBuf(t string, out *bytes.Buffer) error {
// read template
tContent, err := ioutil.ReadFile(t)
if err != nil {
return fmt.Errorf("read template failed: %w", err)
}
// init
fMap := sprig.TxtFuncMap()
for k, v := range SprigCustomFuncs {
fMap[k] = v
}
tpl := template.Must(
template.New(t).Funcs(fMap).Parse(string(tContent)),
)
//render
err = tpl.Execute(out, p.Values)
if err != nil {
return err
}
return nil
}
- Python
- Javascript / TypeScript
Taky záleží na co...
Terrafom
Nástroj který skrze deklarativní jazyk (DSL), indepotent konfigurace a CLI rozhraní umožňuje přes API spravovat vzdálené ......
https://registry.terraform.io/browse/providers
Terrafom
Terraform
variable "aws_region" {
default = "eu-west-1"
}
variable "domain" {
default = "my_domain"
}
provider "aws" {
region = "${var.aws_region}"
}
# Note: The bucket name needs to carry the same name as the domain!
# http://stackoverflow.com/a/5048129/2966951
resource "aws_s3_bucket" "site" {
bucket = "${var.domain}"
acl = "public-read"
policy = <<EOF
{
"Version":"2008-10-17",
"Statement":[{
"Sid":"AllowPublicRead",
"Effect":"Allow",
"Principal": {"AWS": "*"},
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::${var.domain}/*"]
}]
}
EOF
website {
index_document = "index.html"
}
}
# Note: Creating this route53 zone is not enough. The domain's name servers need to point to the NS
# servers of the route53 zone. Otherwise the DNS lookup will fail.
# To verify that the dns lookup succeeds: `dig site @nameserver`
resource "aws_route53_zone" "main" {
name = "${var.domain}"
}
resource "aws_route53_record" "root_domain" {
zone_id = "${aws_route53_zone.main.zone_id}"
name = "${var.domain}"
type = "A"
alias {
name = "${aws_cloudfront_distribution.cdn.domain_name}"
zone_id = "${aws_cloudfront_distribution.cdn.hosted_zone_id}"
evaluate_target_health = false
}
}
resource "aws_cloudfront_distribution" "cdn" {
origin {
origin_id = "${var.domain}"
domain_name = "${var.domain}.s3.amazonaws.com"
}
# If using route53 aliases for DNS we need to declare it here too, otherwise we'll get 403s.
aliases = ["${var.domain}"]
enabled = true
default_root_object = "index.html"
default_cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "${var.domain}"
forwarded_values {
query_string = true
cookies {
forward = "none"
}
}
viewer_protocol_policy = "allow-all"
min_ttl = 0
default_ttl = 3600
max_ttl = 86400
}
# The cheapest priceclass
price_class = "PriceClass_100"
# This is required to be specified even if it's not used.
restrictions {
geo_restriction {
restriction_type = "none"
locations = []
}
}
viewer_certificate {
cloudfront_default_certificate = true
}
}
output "s3_website_endpoint" {
value = "${aws_s3_bucket.site.website_endpoint}"
}
output "route53_domain" {
value = "${aws_route53_record.root_domain.fqdn}"
}
output "cdn_domain" {
value = "${aws_cloudfront_distribution.cdn.domain_name}"
}
Ansible
Je taky nástroj s deklarativní konfigurací, CLI rozhraním a taky umožňuje spravovat vzdálené "servery, routery".
Tzv. config management.
https://www.guru99.com/ansible-tutorial.html
(Jiné: Salt, https://saltproject.io/ ).
Ansible
# playbook-base.yaml
- name: base
become: true
become_method: sudo
hosts: all
roles:
- sshd
- users
- system
- netplan
- network
#- hardening
#- cleanup
serial: "{{ batch_size|default(10) }}"
Ansible
- name: sshd configuration file
template:
src: sshd_config.metal.j2
dest: "{{ sshdconfig }}"
owner: 0
group: 0
mode: 0600
validate: "sshd -T -f %s"
backup: yes
vars:
sshd_allow_users: "{{ (admin_users + autom_users) | join(' ') }} "
notify:
- restart sshd
# when: ansible_virtualization_role != "guest" or ansible_virtualization_type != "docker"
Gitlab CI
GitLab je DevOps platforma! Hurá.
Server pro Git repositáře. Web UI. Správa issues. A taky hlavně CI
(Jiné: Github https://github.com, Gitea https://gitea.io ).
Gitlab CI
Gitlab CI
Argo CD
Deklarativní Continuous Delivery. Pro GitOps. Rozumíme...
"infrastruktura = kód"; Šroubovák co instaluje ten kód do Kubernetes.
https://argoproj.github.io/cd/
Argo CD
Argo CD
Argo CD
Prometheus
Původně "SoundCloud" (2012).
multi-dimensional data model, operational simplicity, scalable data collection, and a powerful query language
Prometheus
{
name: 'etcd-service',
rules: [
{
alert: 'EtcdDatabaseSpaceFilling',
expr: |||
100 - (etcd_mvcc_db_total_size_in_bytes / etcd_server_quota_backend_bytes) * 100 < 15
||| % $._config,
'for': '10m',
labels: {
severity: 'major',
identifier: '{{ $labels.instance }}',
group: 'Infrastructure',
service_name: 'etcd',
tenant: 'ves-sre',
},
annotations: {
display_name: 'Database Error',
description: 'Etcd database {{ $labels.instance }} is filling up. Only {{ $value }}% of space is available.',
},
docs:: {
name: |||
TODO
|||,
description: |||
Etcd database is filling up.
|||,
action: |||
This has to be solved in next business day working hours by L2.
|||,
},
},
Prometheus
# Aggregating up requests per second that has a path label:
- record: instance_path:requests:rate5m
expr: rate(requests_total{job="myjob"}[5m])
- record: path:requests:rate5m
expr: sum without (instance)(instance_path:requests:rate5m{job="myjob"})
Recording Rule Example 2
================================
# Calculating a request failure ratio and aggregating up to the job-level failure ratio:
- record: instance_path:request_failures:rate5m
expr: rate(request_failures_total{job="myjob"}[5m])
- record: instance_path:request_failures_per_requests:ratio_rate5m
expr: |2
instance_path:request_failures:rate5m{job="myjob"}
/
instance_path:requests:rate5m{job="myjob"}
# Aggregate up numerator and denominator, then divide to get path-level ratio.
- record: path:request_failures_per_requests:ratio_rate5m
expr: |2
sum without (instance)(instance_path:request_failures:rate5m{job="myjob"})
/
sum without (instance)(instance_path:requests:rate5m{job="myjob"})
Fluentbit
Treasure Data "fluentd --> fluentbit" (2014).
super fast, lightweight, and highly scalable logging and metrics processor
Fluentbit
Vector.dev
Vector.dev
Vector.dev
## Fluent source
{{- if has .vector_enable_fluent $enabled }}
[sources.in_fluent]
type = "fluent"
address = "0.0.0.0:24224"
connection_limit = 2000
keepalive.time_secs = 30
receive_buffer_bytes = 1048576
tls.enabled = true
tls.verify_certificate = true
tls.ca_file = "/corp/secrets/identity/client_ca_with_fluent.crt"
tls.crt_file = "/corp/secrets/identity/server.crt"
tls.key_file = "/corp/secrets/identity/server.key"
{{- else }}
[sources.in_fluent]
type = "file"
include = ["/dev/null"]
{{- end }}
[transforms.remap_fluent_throttle_key]
type = "remap"
inputs = ["in_fluent"]
source = '''
if match(.tag, r'^alert\..*$') ?? false {
._throttle_key, err = join([.labels.cluster_name, .labels.alertname], separator: "_")
if err != null {
log("Unable to construct throttle key for alert cluster_name=" + to_string!(.labels.cluster_name) + ", alertname="+ to_string!(.labels.alertname) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
abort
}
} else {
._throttle_key, err = join([.cluster_name, .tag], separator: "_")
if err != null {
log("Unable to construct throttle key for cluster_name=" + to_string!(.cluster_name) + ", tag="+ to_string!(.tag) +". Dropping invalid event. " + err, level: "error", rate_limit_secs: 10)
abort
}
}
'''
Observability
Observability
Observability
Observability
Observability
Observability
Observability
Recap
- Zapomeňte na "buzzwords"
- Develop code, Deploy with code, Emit metrics, Improve
- Čtěte dokumentaci ne StackOverflow a youtube
- Tvořte cokoliv
Look Forward
Dev Ops & SRE
By Petr Michalec
Dev Ops & SRE
- 981