AWS Sr. Cloud Support Engineer (Container and DevOps)
AWS Certificates All-Five (2017)
Started my Kubernetes journey since 2017 (EKS)
Author of《Mastering Elastic Kubernetes Service on AWS》
Certificates: CKA, CKS, CKAD (CNCF) + Solution Architect Professional, DevOps Engineer Professional (AWS)
✍️ Created EasonTechTalk.com
🏃 Marathon runner
🤿 PADI AOW diver
https://easoncao.com/about/
驗證型 (Validating) Webhook
修改型 (Mutating) Webhook
$ kubectl describe MutatingWebhookConfiguration/aws-load-balancer-webhook
Name: aws-load-balancer-webhook
API Version: admissionregistration.k8s.io/v1
Kind: MutatingWebhookConfiguration
Webhooks:
Admission Review Versions:
v1beta1
Client Config:
Service:
Name: aws-load-balancer-webhook-service
Namespace: kube-system
Path: /mutate-v1-pod
Port: 443
Failure Policy: Fail
Name: mpod.elbv2.k8s.aws
Namespace Selector:
Match Expressions:
Key: elbv2.k8s.aws/pod-readiness-gate-inject
Operator: In
Values:
enabled
Object Selector:
Match Expressions:
Key: app.kubernetes.io/name
Operator: NotIn
Values:
aws-load-balancer-controller
Rules:
API Versions:
v1
Operations:
CREATE
Resources:
pods
...
error when patching "istio-gateway.yaml": Internal error occurred: failed calling webhook "validate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.default.svc:443/validate/fail?timeout=10s": context deadline exceeded
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedDeployModel 53m (x9 over 63m) ingress (combined from similar events): Failed deploy model due to Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=30s": x509: certificate has expired or is not yet valid: current time 2022-03-03T07:37:16Z is after 2022-02-26T11:24:26Z
E0119 11:37:53.532226 1 shared_informer.go:243] unable to sync caches for garbage collector
E0119 11:37:53.532261 1 garbagecollector.go:228] timed out waiting for dependency graph builder sync during GC sync (attempt 73)
I0119 11:37:54.680276 1 request.go:645] Throttling request took 1.047002085s, request: GET:https://10.150.233.43:6443/apis/configuration.konghq.com/v1beta1?timeout=32s
I0119 11:37:54.831942 1 shared_informer.go:240] Waiting for caches to sync for garbage collector
I0119 11:38:04.722878 1 request.go:645] Throttling request took 1.860914441s, request: GET:https://10.150.233.43:6443/apis/acme.cert-manager.io/v1alpha2?timeout=32s
E0119 11:38:04.861576 1 shared_informer.go:243] unable to sync caches for resource quota
E0119 11:38:04.861687 1 resource_quota_controller.go:447] timed out waiting for quota monitor sync
Parameter | Default |
---|---|
--concurrent-resource-quota-syncs | 5 |
--resource-quota-sync-period | 5m0s |
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: pod-resources-example
spec:
resources:
requests:
memory: "100Mi"
limits:
memory: "200Mi"
containers:
- name: memory-demo-ctr
image: nginx
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1"]
$ kubectl get events -n curl
...
23m Normal SuccessfulCreate replicaset/curl-9454cc476 Created pod: curl-9454cc476-khp45
22m Warning FailedCreate replicaset/curl-9454cc476 Error creating: Internal error occurred: failed calling webhook "namespace.sidecar-injector.istio.io": failed to call webhook: Post "https://istiod.istio-system.svc:443/inject?timeout=10s": dial tcp 10.96.44.51:443: connect: connection refused
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
example-pod-1234 0/1 Evicted 0 5m
example-pod-5678 0/1 Evicted 0 10m
example-pod-9012 0/1 Evicted 0 7m
example-pod-3456 0/1 Evicted 1 15m
example-pod-7890 0/1 Evicted 0 3m
Resource exhaustion (OOM, DiskPressure or PIDPressure)
-> Node-pressure Eviction
-> Webhook Pod terminated
I0623 12:15:42.123456 1 job_controller.go:256] Syncing Job default/example-job
E0623 12:15:42.124789 1 job_controller.go:276] Error syncing Job "default/example-job": Internal error occurred: failed calling webhook "validate.jobs.example.com": Post "https://webhook-service.default.svc:443/validate?timeout=10s": dial tcp 10.96.0.42:443: connect: connection refused
W0623 12:15:42.125000 1 controller.go:285] Retrying webhook request after failure
E0623 12:15:52.130123 1 job_controller.go:276] Error syncing Job "default/example-job": Internal error occurred: failed calling webhook "validate.jobs.example.com": Post "https://webhook-service.default.svc:443/validate?timeout=10s": dial tcp 10.96.0.42:443: connect: connection refused
W0623 12:15:52.130456 1 controller.go:285] Retrying webhook request after failure
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 18s (x14 over 60s) daemonset-controller Error creating: Internal error occurred: failed calling webhook "validate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/validate/fail?timeout=10s": no endpoints available for service "kyverno-svc"
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "RequestResponse",
"stage": "ResponseComplete",
"requestURI": "/api/v1/namespaces/calico-system/services/calico-typha",
"verb": "update",
"responseStatus": {
"metadata": {},
"status": "Failure",
"message": "Internal error occurred: failed calling webhook \"validate.kyverno.svc-fail\": failed to call webhook: Post \"https://kyverno-svc.kyverno.svc:443/validate/fail?timeout=10s\": no endpoints available for service \"kyverno-svc\"",
"reason": "InternalError",
"details": {
"causes": [{
"message": "failed calling webhook \"validate.kyverno.svc-fail\": failed to call webhook: Post \"https://kyverno-svc.kyverno.svc:443/validate/fail?timeout=10s\": no endpoints available for service \"kyverno-svc\""
}]
},
"code": 500
},
}
$ kubectl get --raw /metrics | grep "apiserver_admission_webhook"
apiserver_admission_webhook_request_total{code="400",name="mpod.elbv2.k8s.aws",operation="CREATE",rejected="true",type="admit"} 17
$ kubectl get --raw /metrics | grep "apiserver_admission_webhook_rejection"
apiserver_admission_webhook_rejection_count{error_type="calling_webhook_error",name="mpod.elbv2.k8s.aws",operation="CREATE",rejection_code="400",type="admit"} 17
針對關鍵性的應用和服務,可以參考是否提供對應的 Prometheus 或對應指標監控應用的可用性,建立特定的監控指標和警報閾值。同時監控相關的資源使用情況,如 CPU、記憶體使用率,以及網絡延遲等指標。
Kubernetes API Server logs
fields @timestamp, @message, @logStream
| filter @logStream like /kube-apiserver/
| filter @message like 'failed to call webhook'
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
name: aws-load-balancer-webhook
webhooks:
- clientConfig:
service:
name: aws-load-balancer-webhook-service
namespace: kube-system
path: /mutate-v1-pod
failurePolicy: Fail # <--- Replace "Fail" to "Ignore"
name: mpod.elbv2.k8s.aws
...
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "demo-policy.example.com"
spec:
failurePolicy: Fail
matchConstraints:
resourceRules:
- apiGroups: ["apps"]
apiVersions: ["v1"]
operations: ["CREATE", "UPDATE"]
resources: ["deployments"]
validations:
- expression: "object.spec.replicas <= 5"
---
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "demo-binding-test.example.com"
spec:
policyName: "demo-policy.example.com"
validationActions: [Deny]
matchResources:
namespaceSelector:
matchLabels:
environment: test
Kubernetes v1.30 [stable]