EKS 训练营-健康检查(3)

网友投稿 285 2022-10-25

EKS 训练营-健康检查(3)

介绍

默认情况下,K8s 会自动重启任何原因导致的宕机的容器实例。可以通过配置包括 Pod存活探测 以及 服务就绪探测 的健康检查服务来完成对应的工作。想详细了解原理的,可以参考 K8s 健康检查官方文档

Pod存活探测(Liveness probes)

1.创建项目

mkdir -p ~/environment/healthchecks

创建 yaml 文件

cd ~/environment/healthchecks cat < liveness-app.yaml apiVersion: v1 kind: Pod metadata: name: liveness-app spec: containers: - name: liveness image: brentley/ecsdemo-nodejs livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 5 periodSeconds: 5 EoF

2.部署和确认服务 Ready

cd ~/environment/healthchecks/ kubectl apply -f liveness-app.yaml kubectl get pod liveness-app

返回类似如下

NAME READY STATUS RESTARTS AGE liveness-app 0/1 ContainerCreating 0 1s

然后我们查看history

kubectl describe pod liveness-app

返回类似如下

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 36s default-scheduler Successfully assigned default/liveness-app to ip-172-31-34-171.eu-west-1.compute.internal Normal Pulling 35s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Pulling image "brentley/ecsdemo-nodejs" Normal Pulled 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Successfully pulled image "brentley/ecsdemo-nodejs" in 877.182203ms Normal Created 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Created container liveness Normal Started 34s kubelet, ip-172-31-34-171.eu-west-1.compute.internal Started container liveness

3.人为创造健康检查失败

kubectl get pod liveness-app kubectl exec -it liveness-app -- /bin/kill -s SIGUSR1 1 kubectl get pod liveness-app

4.跟踪日志

执行了步骤安的命令后,nodejs应用程序进入debug模式,不在响应健康检查的请求,所以造成Pod损坏的情况,我们可以通过查看日志跟踪详细过程

kubectl logs liveness-app kubectl logs liveness-app --previous

会发现很多日志,其中有类似如下的章节

::ffff:172.31.34.171 - - [21/May/2021:06:49:06 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:11 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:16 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:21 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:26 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:31 +0000] "GET /health HTTP/1.1" 200 17 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:36 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+" ::ffff:172.31.34.171 - - [21/May/2021:06:49:41 +0000] "GET /health HTTP/1.1" 200 18 "-" "kube-probe/1.20+" Starting debugger agent. Debugger listening on [::]:5858

服务就绪探测(Readiness probes)

1.创建服务

cd ~/environment/healthchecks/ cat < readiness-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: readiness-deployment spec: replicas: 3 selector: matchLabels: app: readiness-deployment template: metadata: labels: app: readiness-deployment spec: containers: - name: readiness-deployment image: alpine command: ["sh", "-c", "touch /tmp/healthy && sleep 86400"] readinessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 3 EoF

2.部署和检查服务

cd ~/environment/healthchecks/ kubectl apply -f readiness-deployment.yaml kubectl get pods -l app=readiness-deployment kubectl describe deployment readiness-deployment | grep Replicas:

检查出来的副本应该如下所示

Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable

3.人为创造健康检查失败

我们人为的删掉 /tmp/healthy 这个响应文件,会导致应用健康检查失败

# kubectl exec -it -- rm /tmp/healthy kubectl exec -it readiness-deployment-644f56898d-4mcdk -- rm /tmp/healthy kubectl get pods -l app=readiness-deployment

这个时候去查看副本状态

kubectl describe deployment readiness-deployment | grep Replicas:

就会发现其中有一个异常

Replicas: 3 desired | 3 updated | 3 total | 2 available | 1 unavailable

4.修复错误

我们只需要进入那个pod,手工再创建对应的文件即可让应用健康检查恢复正常

kubectl exec -it readiness-deployment-644f56898d-4mcdk -- touch /tmp/healthy kubectl get pods -l app=readiness-deployment kubectl describe deployment readiness-deployment | grep Replicas:

恢复后,副本数又都变成了3个

Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable

清理环境

当你不需要此环境时,可以通过如下方式删除

cd ~/environment/healthchecks/ kubectl delete -f liveness-app.yaml kubectl delete -f readiness-deployment.yaml

欢迎大家扫码关注,获取更多信息

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:WebXR API可能是混合现实的巨大机遇,WebVR升级再次升级
下一篇:Springboot基于Redisson实现Redis分布式可重入锁源码解析
相关文章

 发表评论

暂时没有评论,来抢沙发吧~