侧边栏壁纸
博主头像
一揽芳华 博主等级

行动起来,活在当下

  • 累计撰写 265 篇文章
  • 累计创建 24 个标签
  • 累计收到 4 条评论

目 录CONTENT

文章目录

六、Kubernetes核心概念pod

芳华是个男孩!
2024-10-15 / 0 评论 / 0 点赞 / 9 阅读 / 0 字
广告 广告

一、pod的YAML资源清单格式


apiVersion: v1       #必选,版本号,例如v1
kind: Pod       #必选,Pod
metadata:       #必选,元数据
  name: string       #必选,Pod名称
  namespace: string    #必选,Pod所属的命名空间
  labels:      #自定义标签
    - name: string     #自定义标签名字
  annotations:       #自定义注释列表
    - name: string
spec:         #必选,Pod中容器的详细定义
  containers:      #必选,Pod中容器列表
  - name: string     #必选,容器名称
    image: string    #必选,容器的镜像名称
    imagePullPolicy: [Always | Never | IfNotPresent] #获取镜像的策略 Alawys表示下载镜像 IfnotPresent表示优先使用本地镜像,否则下载镜像,Nerver表示仅使用本地镜像
    command: [string]    #容器的启动命令列表,如不指定,使用打包时使用的启动命令
    args: [string]     #容器的启动命令参数列表
    workingDir: string     #容器的工作目录
    volumeMounts:    #挂载到容器内部的存储卷配置
    - name: string     #引用pod定义的共享存储卷的名称,需用volumes[]部分定义的的卷名
      mountPath: string    #存储卷在容器内mount的绝对路径,应少于512字符
      readOnly: boolean    #是否为只读模式
    ports:       #需要暴露的端口库号列表
    - name: string     #端口号名称
      containerPort: int   #容器需要监听的端口号
      hostPort: int    #容器所在主机需要监听的端口号,默认与Container相同
      protocol: string     #端口协议,支持TCP和UDP,默认TCP
    env:       #容器运行前需设置的环境变量列表
    - name: string     #环境变量名称
      value: string    #环境变量的值
    resources:       #资源限制和请求的设置
      limits:      #资源限制的设置
        cpu: string    #Cpu的限制,单位为core数,将用于docker run --cpu-shares参数
        memory: string     #内存限制,单位可以为Mib/Gib,将用于docker run --memory参数
      requests:      #资源请求的设置
        cpu: string    #Cpu请求,容器启动的初始可用数量
        memory: string     #内存清楚,容器启动的初始可用数量
    livenessProbe:     #对Pod内个容器健康检查的设置,当探测无响应几次后将自动重启该容器,检查方法有exec、httpGet和tcpSocket,对一个容器只需设置其中一种方法即可
      exec:      #对Pod容器内检查方式设置为exec方式
        command: [string]  #exec方式需要制定的命令或脚本
      httpGet:       #对Pod内个容器健康检查方法设置为HttpGet,需要制定Path、port
        path: string
        port: number
        host: string
        scheme: string
        HttpHeaders:
        - name: string
          value: string
      tcpSocket:     #对Pod内个容器健康检查方式设置为tcpSocket方式
         port: number
       initialDelaySeconds: 0  #容器启动完成后首次探测的时间,单位为秒
       timeoutSeconds: 0   #对容器健康检查探测等待响应的超时时间,单位秒,默认1秒
       periodSeconds: 0    #对容器监控检查的定期探测时间设置,单位秒,默认10秒一次
       successThreshold: 0
       failureThreshold: 0
       securityContext:
         privileged:false
    restartPolicy: [Always | Never | OnFailure]#Pod的重启策略,Always表示一旦不管以何种方式终止运行,kubelet都将重启,OnFailure表示只有Pod以非0退出码退出才重启,Nerver表示不再重启该Pod
    nodeSelector: obeject  #设置NodeSelector表示将该Pod调度到包含这个label的node上,以key:value的格式指定
    imagePullSecrets:    #Pull镜像时使用的secret名称,以key:secretkey格式指定
    - name: string
    hostNetwork:false      #是否使用主机网络模式,默认为false,如果设置为true,表示使用宿主机网络
    volumes:       #在该pod上定义共享存储卷列表
    - name: string     #共享存储卷名称 (volumes类型有很多种)
      emptyDir: {}     #类型为emtyDir的存储卷,与Pod同生命周期的一个临时目录。为空值
      hostPath: string     #类型为hostPath的存储卷,表示挂载Pod所在宿主机的目录
        path: string     #Pod所在宿主机的目录,将被用于同期中mount的目录
      secret:      #类型为secret的存储卷,挂载集群与定义的secre对象到容器内部
        scretname: string  
        items:     
        - key: string
          path: string
      configMap:     #类型为configMap的存储卷,挂载预定义的configMap对象到容器内部
        name: string
        items:
        - key: string
          path: string

二、pod的创建与删除

1、命令创建pod

  • k8s之前的版本,kubectl run命令用于创建deployment控制器
  • 在v1.18版本中,kubectl run命令改为创建pod

1.1、创建一个名为pod-nginx的pod

[root@k8s-master01 ~]# kubectl run pod-nginx --image=nginx:latest
pod/pod-nginx created

1.2、验证

[root@k8s-master01 ~]# kubectl get pods
NAME        READY   STATUS    RESTARTS   AGE
pod-nginx   1/1     Running   0          5m4s

2、使用yaml文件创建pod

2.1、创建yaml文件

apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress"
  namespace: default
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    cmmand: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]

释义:

apiVersion: v1        # api版本
kind: Pod            # 资源类型为Pod
metadata:
  name: "pod-stress"    # 定义pod名称
  namespace: default    # 所属空间
spec:
  containers:            # 定义pod里面包含的容器
  - name: c1            # 定义pod中容器的名称
    image: "polinux/stress"        # 启动容器镜像名称
    command: ["stress"]            # 定义启动容器执行命令(类似dockerfile中的cmd)
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]        # 自定义启动容器执行命令的参数
# polinux/stress这个镜像用于压力测试,在启动镜像时传命令与参数就相当于运行时需要的压力

2.2、通过yaml文件创建pod

## 创建文件
[root@k8s-master01 ~]# cat <<EOF > pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress"
  namespace: default
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    command: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]
EOF

## 检查是否存在语法错误
[root@k8s-master01 ~]# kubectl apply -f pod1.yaml --dry-run=client
pod/pod-stress created (dry run)

## 创建pod
[root@k8s-master01 ~]# kubectl apply -f pod1.yaml 
pod/pod-stress created

2.3、查看pod信息

2.3.1、查看pod信息

[root@k8s-master01 ~]# kubectl get pod
NAME         READY   STATUS    RESTARTS   AGE
pod-nginx    1/1     Running   0          22m
pod-stress   1/1     Running   0          35s

2.3.2、查看pod详细信息

[root@k8s-master01 ~]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
pod-nginx    1/1     Running   0          33m   10.244.79.71     k8s-worker01   <none>           <none>
pod-stress   1/1     Running   0          10m   10.244.203.203   k8s-worker04   <none>           <none>

2.3.3、描述pod详细信息

[root@k8s-master01 ~]# kubectl describe pod pod-stress
Name:         pod-stress
Namespace:    default
Priority:     0
Node:         k8s-worker04/192.168.122.17
Start Time:   Sun, 18 Feb 2024 10:48:54 +0800
Labels:       <none>
Annotations:  cni.projectcalico.org/containerID: fb861c71639bb187b203e4eb633026eeb90e4b1b7f6c14843a5b20af48518d62
              cni.projectcalico.org/podIP: 10.244.203.203/32
              cni.projectcalico.org/podIPs: 10.244.203.203/32
Status:       Running
IP:           10.244.203.203
IPs:
  IP:  10.244.203.203
Containers:
  c1:
    Container ID:  docker://c6a429144211d623624189c24c1fae72c1c928d104ff47c6f46a7f5848effdfc
    Image:         polinux/stress
    Image ID:      docker-pullable://polinux/stress@sha256:b6144f84f9c15dac80deb48d3a646b55c7043ab1d83ea0a697c09097aaad21aa
    Port:          <none>
    Host Port:     <none>
    Command:
      stress
    Args:
      --vm
      1
      --vm-bytes
      150M
      --vm-hang
      1
    State:          Running
      Started:      Sun, 18 Feb 2024 10:49:11 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wkwtm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-wkwtm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  12m   default-scheduler  Successfully assigned default/pod-stress to k8s-worker04
  Normal  Pulling    12m   kubelet            Pulling image "polinux/stress"
  Normal  Pulled     12m   kubelet            Successfully pulled image "polinux/stress" in 16.549991829s
  Normal  Created    12m   kubelet            Created container c1
  Normal  Started    12m   kubelet            Started container c1

3、删除pod

3.1、删除单个pod

  • 方法1

    [root@k8s-master01 ~]# kubectl delete pod pod-nginx 
    pod "pod-nginx" deleted
    
  • 方法2

    [root@k8s-master01 ~]# kubectl delete -f pod1.yaml 
    pod "pod-stress" deleted
    

3.2、删除多个pod

  • 方法1:后接多个pod名

    [root@k8s-master01 ~]# kubectl delete pod pod名1 pod名2 pod名3 ……
    
  • 方法2:通过awk截取要删除的pod名称,然后管道给xargs

    [root@k8s-master01 ~]# kubectl get pods |awk 'NR>1 {print $1}' |xargs kubectl delete pod
    
  • 方法3:如果要删除pod所在同一个非default的命名空间,则可以删除命名空间

    [root@k8s-master01 ~]# kubectl delete ns xxx
    

三、pod镜像拉去策略

1、Pod拉取容器镜像策略

  • 由imagePullPolicy参数控制
    • Always : 不管本地有没有镜像,都要从仓库中下载镜像
    • Never : 从来不从仓库下载镜像, 只用本地镜像,本地没有就算了
    • IfNotPresent: 如果本地存在就直接使用, 不存在才从仓库下载
  • 默认的策略是:
    • 当镜像标签版本是latest,默认策略就是Always
    • 如果指定特定版本默认拉取策略就是IfNotPresent。

示例,修改yaml

apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress"
  namespace: default
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    cmmand: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]
    imagePullPolicy: IfNotPresent       #  增加了这一句

操作,应用yaml

# 查看yaml文件
[root@k8s-master01 ~]# cat pod1.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress"
  namespace: default
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    command: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]
    imagePullPolicy: IfNotPresent

# 检查是否存在语法错误
[root@k8s-master01 ~]# kubectl apply -f pod1.yaml --dry-run=client
pod/pod-stress created (dry run)

# 创建pod
[root@k8s-master01 ~]# kubectl apply -f pod1.yaml
pod/pod-stress created

# 查看描述详细信息
[root@k8s-master01 ~]# kubectl describe pod pod-stress 
······
Events: #事件信息
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  73s   default-scheduler  Successfully assigned default/pod-stress to k8s-worker04
  Normal  Pulled     73s   kubelet            Container image "polinux/stress" already present on machine
  Normal  Created    73s   kubelet            Created container c1
  Normal  Started    73s   kubelet            Started container c1
说明:因为本地存在该镜像,所以没有从仓库下载

四、pod标签


1、pod的标签

  • 为pod设置label,用于控制器controller(后面会讲)通过label与pod关联
  • 用法与前面讲的node标签差不多

2、通过命令管理Pod标签

2.1、查看pod标签

[root@k8s-master01 ~]# kubectl get pod --show-labels 
NAME         READY   STATUS    RESTARTS   AGE   LABELS
pod-stress   1/1     Running   0          14m   <none>

2.2、打标签,在查看

[root@k8s-master01 ~]# kubectl label pod pod-stress region=huanai zone=A env=test bussiness=game
pod/pod-stress labeled

NAME         READY   STATUS    RESTARTS   AGE   LABELS
pod-stress   1/1     Running   0          15m   bussiness=game,env=test,region=huanai,zone=A

2.3、通过等值关系标签查询

[root@k8s-master01 ~]# kubectl get pod -l zone=A
NAME         READY   STATUS    RESTARTS   AGE
pod-stress   1/1     Running   0          16m

2.4、通过集合关系标签查询

[root@k8s-master01 ~]# kubectl get pod -l "zone in(A,B,C)"
NAME         READY   STATUS    RESTARTS   AGE
pod-stress   1/1     Running   0          19m

2.5、删除标签后再验证

[root@k8s-master01 ~]# kubectl label pod pod-stress region- zone- env- bussiness-
pod/pod-stress labeled

[root@k8s-master01 ~]# kubectl get pod --show-labels 
NAME         READY   STATUS    RESTARTS   AGE   LABELS
pod-stress   1/1     Running   0          20m   <none>

小结

  • pod的label与node的label操作方式几乎相同
  • node的label用于pod调度到指定label的node节点
  • pod的label用于controller关联控制的pod

3、通过yaml文件创建pod管理标签

3.1、修改yaml文件

apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress"
  namespace: default
  labels:      # 直接在yaml文件中添加多个标签
    env: dev
    app: nginx
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    cmmand: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]
    imagePullPolicy: IfNotPresent

3.2、检查语法错误

[root@k8s-master01 ~]# kubectl apply -f pod1.yaml --dry-run=client
pod/pod-stress configured (dry run)

3.3、直接应用

[root@k8s-master01 ~]# kubectl apply -f pod1.yaml 
pod/pod-stress configured    #这里configured表示修改了

3.4、验证

[root@k8s-master01 ~]# kubectl get pod --show-labels 
NAME         READY   STATUS    RESTARTS   AGE    LABELS
pod-stress   1/1     Running   0          3m4s   app=nginx,env=dev

五、pod资源限制


准备2个不同限制方式创建的pod的yaml文件

1、准备运行第一个yaml文件

# 编写yaml文件
[root@k8s-master01 ~]# cat pod2.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: namespace1
---
apiVersion: v1
kind: Pod
metadata:
  name: "pod-stress2"
  namespace: namespace1
  labels:
    env: dev
    app: pod-stress
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        memory: "200Mi"    #上限是200M
      requests:
        memory: "100Mi"    #保证至少100M
    command: ["stress"]             # 启动容器时执行的命令
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]         #产生1个进程分配150M内存1秒后释放
# 检查语法错误
[root@k8s-master01 ~]# kubectl apply -f pod2.yaml --dry-run=client
namespace/namespace1 created (dry run)
pod/pod-stress2 created (dry run)
# 应用yaml文件
[root@k8s-master01 ~]# kubectl apply -f pod2.yaml
namespace/namespace1 unchanged
pod/pod-stress2 created
# 查看详细信息
[root@k8s-master01 ~]# kubectl get pod -n namespace1 -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
pod-stress2   1/1     Running   0          89s   10.244.203.205   k8s-worker04   <none>           <none>

2、准备运行第二个yaml文件

# 编写yaml文件
apiVersion: v1
kind: Namespace
metadata:
  name: namespace1
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-stress3
  namespace: namespace1
  labels: 
    env: dev
    app: nginx
spec:
  containers:
  - name: c2
    image: "polinux/stress"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "150Mi"   #最小保证内存150M
    command: ["stress"]
    args: ["--vm","1","--vm-bytes","250M","--vm-hang","1"]         #产生1个进程分配250M内存1秒后释放,容器启动分配内容大于限制的内容,查看效果
# 检查语法错误
[root@k8s-master01 ~]# kubectl apply -f pod3.yaml --dry-run=client
namespace/namespace1 configured (dry run)
pod/pod-stress3 created (dry run)
# 应用yaml文件
[root@k8s-master01 ~]# kubectl apply -f pod3.yaml 
namespace/namespace1 unchanged
pod/pod-stress3 created
# 检查pod运行情况,发现pod-stress3运行状态为OOMKilled
[root@k8s-master01 ~]# kubectl get pod -n namespace1 
NAME          READY   STATUS      RESTARTS   AGE
pod-stress2   1/1     Running     0          14m
pod-stress3   0/1     OOMKilled   6          5m56s

# 查看导致OOMKilled的原因
[root@k8s-master01 ~]# kubectl describe  pod -n namespace1 pod-stress3
Name:         pod-stress3
Namespace:    namespace1
Priority:     0
Node:         k8s-worker04/192.168.122.17
Start Time:   Sun, 18 Feb 2024 15:37:04 +0800
Labels:       app=nginx
              env=dev
Annotations:  cni.projectcalico.org/containerID: a81602adb0cf912476fbb10c217442f1126c8ca4856214a749a1420d4826ccf7
              cni.projectcalico.org/podIP: 10.244.203.206/32
              cni.projectcalico.org/podIPs: 10.244.203.206/32
Status:       Running
IP:           10.244.203.206
IPs:
  IP:  10.244.203.206
Containers:
  c2:
    Container ID:  docker://02bab51cc71d9eb8b5ce8409d73e2a909936bab656f02f2f987d20480bdde655
    Image:         polinux/stress
    Image ID:      docker-pullable://polinux/stress@sha256:b6144f84f9c15dac80deb48d3a646b55c7043ab1d83ea0a697c09097aaad21aa
    Port:          <none>
    Host Port:     <none>
    Command:
      stress
    Args:
      --vm
      1
      --vm-bytes
      250M
      --vm-hang
      1
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled     ##OOM超出内存限制
      Exit Code:    1
      Started:      Sun, 18 Feb 2024 15:38:36 +0800
      Finished:     Sun, 18 Feb 2024 15:38:36 +0800
    Ready:          False
    Restart Count:  4
    Limits:
      memory:  200Mi
    Requests:
      memory:     150Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kpzpb (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-kpzpb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  2m18s                 default-scheduler  Successfully assigned namespace1/pod-stress3 to k8s-worker04
  Normal   Pulled     46s (x5 over 2m18s)   kubelet            Container image "polinux/stress" already present on machine
  Normal   Created    46s (x5 over 2m18s)   kubelet            Created container c2
  Normal   Started    46s (x5 over 2m18s)   kubelet            Started container c2
  Warning  BackOff    33s (x10 over 2m17s)  kubelet            Back-off restarting failed container   ## 启动失败

3、说明

  • 说明: 一旦pod中的容器挂了,容器会有重启策略, 如下:
    • Always:表示容器挂了总是重启,这是默认策略
    • OnFailures:表容器状态为错误时才重启,也就是容器正常终止时才重启
    • Never:表示容器挂了不予重启
  • 对于Always这种策略,容器只要挂了,就会立即重启,这样是很耗费资源的。所以Always重启策略是这么做的:第一次容器挂了立即重启,如果再挂了就要延时10s重启,第三次挂了就等20s重启...... 依次类推

4、测试完成后删除掉pod和namespace

# 删除掉namespace后会连同该命名空间下所有的pod和容器
[root@k8s-master01 ~]# kubectl delete  ns  namespace1

# 验证
[root@k8s-master01 ~]# kubectl get pod
NAME         READY   STATUS    RESTARTS   AGE
pod-stress   1/1     Running   0          72m

[root@k8s-master01 ~]# kubectl get ns
NAME               STATUS   AGE
calico-apiserver   Active   18d
calico-system      Active   18d
default            Active   18d
fh                 Active   16d
gh-pod             Active   18d
kube-node-lease    Active   18d
kube-public        Active   18d
kube-system        Active   18d
kuboard            Active   18d
tigera-operator    Active   18d

六、pod包含多个容器


1、准备yaml文件

[root@k8s-master01 ~]# cat pod4.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: pod-stress
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-stress4
  namespace: pod-stress
  labels: 
    env: dev
    app: nginx
spec:
  containers:
  - name: c1
    image: "polinux/stress"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]
  
  - name: c2
    image: "polinux/stress"
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        memory: "200Mi"
      requests:
        memory: "100Mi"
    command: ["stress"]
    args: ["--vm","1","--vm-bytes","150M","--vm-hang","1"]

2、检查语法并应用

[root@k8s-master01 ~]# kubectl apply -f pod4.yaml --dry-run=client
namespace/pod-stress created (dry run)
pod/pod-stress4 created (dry run)

[root@k8s-master01 ~]# kubectl apply -f pod4.yaml
namespace/pod-stress created
pod/pod-stress4 created

3、查看pod信息

[root@k8s-master01 ~]# kubectl get pods --namespace pod-stress 
NAME          READY   STATUS    RESTARTS   AGE
pod-stress4   2/2     Running   0          50s
[root@k8s-master01 ~]# kubectl get pods --namespace pod-stress -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
pod-stress4   2/2     Running   0          59s   10.244.203.207   k8s-worker04   <none>           <none>

通过kuboard的web端查看

invalid image(图片无法加载)

七、对pod中的容器进行操作


1、命令帮助

[root@k8s-master01 ~]# kubectl exec -h

2、不用交互直接执行命令

格式为: kubectl exec pod名 -c 容器名 -- 命令

注意:

  • -c 容器名为可选项,如果是1个pod中1个容器,则不用指定;
  • 如果是1个pod中多个容器,不指定默认为第1个。
[root@k8s-master01 ~]# kubectl exec -n pod-stress pod-stress4 -- date
Defaulted container "c1" out of: c1, c2
Sun Feb 18 08:18:32 UTC 2024

[root@k8s-master01 ~]# kubectl exec -n pod-stress pod-stress4 -c c1 -- date
Sun Feb 18 08:19:14 UTC 2024

[root@k8s-master01 ~]# kubectl exec -n pod-stress pod-stress4 -c c2 -- date
Sun Feb 18 08:19:18 UTC 2024
  • 不指定容器名,则默认为pod里的第1个容器

3、交互直接执行命令

和docker exec几乎一样
[root@k8s-master01 ~]# kubectl exec -it -n pod-stress pod-stress4 -c c1 -- /bin/bash
bash-5.0# date
Sun Feb 18 08:23:51 UTC 2024
bash-5.0# 

八、pod中多个容器网络共享


验证:同一个pod中如果运行多个同一容器服务会怎么样,以nginx为例

1、编写YAML文件

[root@k8s-master01 yaml]# cat nginx-test.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: nginx-test
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  namespace: nginx-test
  labels: 
    zone: A
    region: wh
spec:
  containers:
  - name: nginx-test1
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
      
  - name: nginx-test2
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent

2、检查语法错误并应用YAML

[root@k8s-master01 yaml]# kubectl apply -f nginx-test.yaml --dry-run=client
namespace/nginx-test created (dry run)
pod/nginx-test created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f nginx-test.yaml 
namespace/nginx-test created
pod/nginx-test created

3、查看pod信息与状态

[root@k8s-master01 yaml]# kubectl get pod -n nginx-test
NAME         READY   STATUS             RESTARTS   AGE
nginx-test   1/2     CrashLoopBackOff   3          78s

[root@k8s-master01 yaml]# kubectl get pod -n nginx-test -o wide
NAME         READY   STATUS             RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
nginx-test   1/2     CrashLoopBackOff   3          80s   10.244.203.208   k8s-worker04   <none>           <none>

4、查看详细描述信息

[root@k8s-master01 yaml]# kubectl describe pod -n nginx-test nginx-test 
Name:         nginx-test
Namespace:    nginx-test
Priority:     0
Node:         k8s-worker04/192.168.122.17
Start Time:   Sun, 18 Feb 2024 16:35:27 +0800
Labels:       region=wh
              zone=A
Annotations:  cni.projectcalico.org/containerID: f46f377b4dc0f91e49f577f64734c437c1ba7d9c3963e6be91296bf3d2949ae3
              cni.projectcalico.org/podIP: 10.244.203.208/32
              cni.projectcalico.org/podIPs: 10.244.203.208/32
Status:       Running
IP:           10.244.203.208
IPs:
  IP:  10.244.203.208
Containers:
  nginx-test1:
    Container ID:   docker://7e07e88ee7735cf599e13a2cbc7cccf3bcfa440051964beee5c98dac275b15c3
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 18 Feb 2024 16:35:27 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p9fw6 (ro)
  nginx-test2:
    Container ID:   docker://91d4dfe2c786616f0f64db7a941c08e0e5b4e97e990150642b24425b367a7a15
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 18 Feb 2024 16:37:06 +0800
      Finished:     Sun, 18 Feb 2024 16:37:09 +0800
    Ready:          False
    Restart Count:  4
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-p9fw6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-p9fw6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  2m21s                default-scheduler  Successfully assigned nginx-test/nginx-test to k8s-worker04
  Normal   Pulled     2m21s                kubelet            Container image "nginx:latest" already present on machine
  Normal   Created    2m21s                kubelet            Created container nginx-test1
  Normal   Started    2m21s                kubelet            Started container nginx-test1
  Warning  BackOff    53s (x7 over 2m15s)  kubelet            Back-off restarting failed container
  Normal   Pulled     42s (x5 over 2m21s)  kubelet            Container image "nginx:latest" already present on machine
  Normal   Created    42s (x5 over 2m21s)  kubelet            Created container nginx-test2
  Normal   Started    42s (x5 over 2m21s)  kubelet            Started container nginx-test2
导致nginx-test2未启动原因是因为在同一个pod中运行了2个同一应用nginx,IP地址只有一个,对于这两个容器来说IP地址是共享的,而nginx默认端口是80,所以当容器nginx-test1运行后80端口就被占用,所以nginx-test2就无法运行,我们可以通过查看nginx-test2的日志信息来看启动报错的原因

5、查看未启动的容器logo日志信息

# 方法1:使用kubectl日志查看
[root@k8s-master01 yaml]# kubectl logs -n nginx-test nginx-test nginx-test2
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2024/02/18 08:38:31 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:38:31 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:38:31 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:38:31 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:38:31 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:38:31 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:38:31 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:38:31 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:38:31 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:38:31 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:38:31 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:38:31 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:38:31 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:38:31 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:38:31 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:38:31 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()

# 方法2:使用docker logo命令查看,该pod被调度到了k8s-worker04上,因此需要切换到该节点上查看
[root@k8s-worker04 ~]# docker logs k8s_nginx-test2_nginx-test_nginx-test_766e074a-2564-4158-8e14-035af09309bb_6 
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2024/02/18 08:41:19 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:41:19 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:41:19 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:41:19 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:41:19 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:41:19 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:41:19 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:41:19 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:41:19 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:41:19 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:41:19 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:41:19 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:41:19 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2024/02/18 08:41:19 [emerg] 1#1: bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
2024/02/18 08:41:19 [notice] 1#1: try again to bind() after 500ms
2024/02/18 08:41:19 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()

九、pod调度


invalid image(图片无法加载)

Step1
通过kubectl命令应用资源清单文件(yaml格式)向api server 发起一个create pod 请求

Step2
api server接收到pod创建请求后,生成一个包含创建信息资源清单文件

Step3
apiserver 将资源清单文件中信息写入etcd数据库

Step4
Scheduler启动后会一直watch API Server,获取 podSpec.NodeName为空的Pod,即判断pod.spec.Node == null? 若为null,表示这个Pod请求是新的,需要创建,因此先进行调度计算(共计2步:1、过滤不满足条件的,2、选择优先级高的),找到合适的node,然后将信息在etcd数据库中更新分配结果:pod.spec.Node = nodeA (设置一个具体的节点)

Step5
kubelet 通过watch etcd数据库(即不停地看etcd中的记录),发现有新的Node出现,如果这条记录中的Node与所在节点编号相同,即这个Pod由scheduler分配给自己,则调用node中的Container Runtime,进而创建container,并将创建后的结果返回到给api server用于更新etcd数据库中数据状态。

1、调度约束方法

我们为了实现容器主机资源平衡使用, 可以使用约束把pod调度到指定的node节点

  • nodeName 用于将pod调度到指定的node名称上
  • nodeSelector 用于将pod调度到匹配Label的node上

2、nodeName

2.1、编写yaml文件

[root@k8s-master01 yaml]# cat nginx-test.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: nginx-test
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  namespace: nginx-test
  labels: 
    zone: A
    region: wh
spec:
  nodeName: k8s-worker01   # 通过nodeName调度到k8s-worker01节点上
  containers:
  - name: nginx-test1
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent

2.2、检查语法和应用

[root@k8s-master01 yaml]# kubectl apply -f nginx-test.yaml --dry-run=client
namespace/nginx-test created (dry run)
pod/nginx-test created (dry run)
[root@k8s-master01 yaml]# kubectl apply -f nginx-test.yaml
namespace/nginx-test created
pod/nginx-test created

2.3、验证信息

[root@k8s-master01 yaml]# kubectl get pod -n nginx-test nginx-test -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP             NODE           NOMINATED NODE   READINESS GATES
nginx-test   1/1     Running   0          45s   10.244.79.75   k8s-worker01   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n nginx-test nginx-test 
Name:         nginx-test
Namespace:    nginx-test
Priority:     0
Node:         k8s-worker01/192.168.122.14
Start Time:   Sun, 18 Feb 2024 17:09:33 +0800
Labels:       region=wh
              zone=A
Annotations:  cni.projectcalico.org/containerID: fa739674cca9ea7adb42545fa68cbcc205c6d0290660a54618e73d4e66c3ef4e
              cni.projectcalico.org/podIP: 10.244.79.75/32
              cni.projectcalico.org/podIPs: 10.244.79.75/32
Status:       Running
IP:           10.244.79.75
IPs:
  IP:  10.244.79.75
Containers:
  nginx-test1:
    Container ID:   docker://39afe3ea0a209fcea5e86f30592afc22fa62365aeec0e577a27adec4677a2042
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 18 Feb 2024 17:09:33 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tdnkn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-tdnkn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   71s   kubelet  Container image "nginx:latest" already present on machine
  Normal  Created  71s   kubelet  Created container nginx-test1
  Normal  Started  71s   kubelet  Started container nginx-test1
  #倒数第三行没有使用scheduler,而是直接运行了,说明nodeName约束生效

3、nodeSelector

将pod创建在指定标签节点上

3.1、为k8s-worker02打标签

[root@k8s-master01 yaml]# kubectl label nodes k8s-worker02 bussiness=game
node/k8s-worker02 labeled

3.2、编写yaml文件

[root@k8s-master01 yaml]# cat nginx-test-01.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: nginx-test
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx-test-01
  namespace: nginx-test
  labels: 
    zone: A
    region: wh
spec:
  nodeSelector:
    bussiness: game
  containers:
  - name: nginx-test-01
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent

3.3、检查语法错误并应用

[root@k8s-master01 yaml]# kubectl apply -f nginx-test-01.yaml --dry-run=client
namespace/nginx-test configured (dry run)
pod/nginx-test-01 created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f nginx-test-01.yaml
namespace/nginx-test unchanged
pod/nginx-test-01 created

3.4、验证

[root@k8s-master01 yaml]# kubectl get pod -n nginx-test nginx-test-01 
NAME            READY   STATUS    RESTARTS   AGE
nginx-test-01   1/1     Running   0          32s

[root@k8s-master01 yaml]# kubectl get pod -n nginx-test nginx-test-01 -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
nginx-test-01   1/1     Running   0          37s   10.244.69.202   k8s-worker02   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n nginx-test nginx-test-01 
Name:         nginx-test-01
Namespace:    nginx-test
Priority:     0
Node:         k8s-worker02/192.168.122.15
Start Time:   Sun, 18 Feb 2024 17:25:25 +0800
Labels:       region=wh
              zone=A
Annotations:  cni.projectcalico.org/containerID: f08fb481999234878f2f7a379153cc7020ccd556e0d1b0d88b504eb1c8dace31
              cni.projectcalico.org/podIP: 10.244.69.202/32
              cni.projectcalico.org/podIPs: 10.244.69.202/32
Status:       Running
IP:           10.244.69.202
IPs:
  IP:  10.244.69.202
Containers:
  nginx-test-01:
    Container ID:   docker://b917cf2caaa5ce540dd0a9f977b9a51044b7c1c315fa170051947a2c43771cc8
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 18 Feb 2024 17:25:26 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6zbp7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-6zbp7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              bussiness=game
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  105s  default-scheduler  Successfully assigned nginx-test/nginx-test-01 to k8s-worker02
  Normal  Pulled     105s  kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    105s  kubelet            Created container nginx-test-01
  Normal  Started    105s  kubelet            Started container nginx-test-01
  #仍然经过了scheduler,但确实被分配到了k8s-worker02上

十、pod生命周期


1、pod生命周期

pod从创建到终止的过程就是pod的生命周期。

  • 有些pod(比如运行httpd服务),正常情况下会一直运行下去,但如果手动删除,此pod会终止
  • 也有些pod(比如执行计算机任务),任务计算完后就会自动终止

invalid image(图片无法加载)

2、容器启动

(1)pod中的容器在创建前,有初始化容器(init container)来进行初始化环境
(2)初化完后,主容器(main container)开始启动
(3)主容器启动后,有一个post start的操作(启动后的触发型操作,或者叫启动后钩子)
(4)post start后,就开始做健康检查(探针后面会讲)

  • 第一个健康检查叫存活状态检查(liveness probe ),用来检查主容器存活状态的
  • 第二个健康检查叫准备就绪检查(readiness probe),用来检查主容器是否启动就绪

3、容器终止

(1)可以在容器终止前设置pre stop操作(终止前的触发型操作,或者叫终止前钩子)
(2)当出现特殊情况不能正常销毁pod时,大概等待30秒会强制终止
(3)终止容器后还可能会重启容器(视容器重启策略而定,重启策略在上面的内容中有写)。

4、重启重启策略

  • Always:表示容器挂了总是重启,这是默认策略
  • OnFailures:表示容器状态为错误时才重启,也就是容器正常终止时不重启
  • Never:表示容器挂了不予重启
  • 对于Always这种策略,容器只要挂了,就会立即重启,这样是很耗费资源的。所以Awavs重启策略是这么做的:第一次容器挂了立即重启,如果再挂了就要延时10s重启,第三次挂了就等20s重启.依次类推

5、HealthCheck健康检查

当Pod启动时,容器可能会因为某种错误(服务未启动或端口不正确)而无法访问,可以使用探针探测并配置策略。

5.1、HealthCheck方式

方式说明
Liveness Probe(存活状态探测)指示容器是否正在运行(是否Running状态)。如果存活态探测失败,则 kubelet 会杀死容器, 并且容器将根据其重启策略决定未来。如果容器不提供存活探针, 则默认状态为 Success
readiness Probe(就绪型探测)指示容器是否准备好为请求提供服务(是否Ready状态)。如果就绪态探测失败, 端点控制器将从与 Pod 匹配的所有服务的端点列表中删除该 Pod 的 IP 地址。 初始延迟之前的就绪态的状态值默认为 Failure。 如果容器不提供就绪态探针,则默认状态为 Success。注:检查后不健康,将容器设置为Notready;如果使用service来访问,流量不会转发给此种状态的pod
startup Probe指示容器中的应用是否已经启动。如果提供了启动探针,则所有其他探针(就是上面那两个)都会被 禁用,直到此探针成功为止,其余探针才会启动。如果启动探测失败,kubelet 将杀死容器,而容器依其 重启策略进行重启。 如果容器没有提供启动探测,则默认状态为 Success

5.2、Probe探测方式

方式说明
Exec执行命令
HTTPGethttp请求某一个URL路径
TCPtcp连接某一个端口
gRPC使用 gRPC 执行一个远程过程调用。 目标应该实现 gRPC健康检查。 如果响应的状态是"SERVING",则认为诊断成功。 gRPC探针是一个 alpha 特性,只有在你启用了"GRPCContainerProbe"特性门控时才能使用。

5.3、liveness-exec案例

检查容器,如果容器存在问题会重启容器

5.3.1、准备yaml文件

[root@k8s-master01 yaml]# cat pod-liveness-exec.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: liveness
---
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  namespace: liveness
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: liveness
    image: "busybox"
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5    #pod启动延迟5s后探测
      periodSeconds: 5    #每5s探测一次
yaml文件详解
  1. apiVersion: v1: 这一行指定了 Kubernetes 对象的 API 版本,这里是 v1 版本。
  2. kind: Namespace: 这一行指定了要创建的 Kubernetes 对象的类型,这里是一个 Namespace(命名空间)对象。
  3. metadata:: 这一行表示对象的元数据部分开始。
  4. name: liveness: 这一行指定了 Namespace 对象的名称为 liveness
  5. ---: 这一行表示 YAML 文档中不同对象的分隔符。
  6. apiVersion: v1: 这一行再次指定了 Kubernetes 对象的 API 版本,这里仍然是 v1 版本。
  7. kind: Pod: 这一行指定了要创建的 Kubernetes 对象的类型,这里是一个 Pod 对象。
  8. metadata:: 这一行表示对象的元数据部分开始。
  9. name: liveness-exec: 这一行指定了 Pod 对象的名称为 liveness-exec
  10. namespace: liveness: 这一行指定了 Pod 对象所属的 Namespace 是 liveness
  11. labels:: 这一行表示标签部分开始,标签是用来对 Kubernetes 对象进行分类和选择的。
  12. zone: A: 这一行指定了一个名为 zone 的标签,其值为 A
  13. region: wh: 这一行指定了一个名为 region 的标签,其值为 wh
  14. spec:: 这一行表示对象的规格部分开始,其中定义了对象的规格和行为。
  15. containers:: 这一行表示容器部分开始,一个 Pod 可以包含一个或多个容器。
  16. - name: liveness: 这一行指定了第一个容器的名称为 liveness
  17. image: "busybox": 这一行指定了容器要使用的镜像为 busybox
  18. imagePullPolicy: IfNotPresent: 这一行指定了镜像拉取策略为 IfNotPresent,即仅当本地不存在该镜像时才拉取。
  19. args:: 这一行表示容器的启动参数部分开始。
  20. - /bin/sh: 这一行指定了第一个参数为 /bin/sh
  21. - -c: 这一行指定了第二个参数为 -c,表示接下来的内容将作为命令执行。
  22. - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600: 这一行指定了要在容器中执行的命令,依次为创建文件 /tmp/healthy,等待 30 秒,删除文件 /tmp/healthy,然后等待 600 秒。
  23. livenessProbe:: 这一行表示 liveness 探针部分开始,用于检测容器的健康状态。
  24. exec:: 这一行表示使用 exec 方式进行 liveness 探针。
  25. command:: 这一行指定了要执行的命令列表。
  26. - cat: 这一行指定了第一个命令为 cat
  27. - /tmp/healthy: 这一行指定了第二个命令的参数为 /tmp/healthy
  28. initialDelaySeconds: 5: 这一行指定了容器启动后等待 5 秒后开始执行 liveness 探针。
  29. periodSeconds: 5: 这一行指定了每 5 秒执行一次 liveness 探针。

5.3.2、检查语法错误并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-exec.yaml --dry-run=client
namespace/liveness created (dry run)
pod/liveness-exec created (dry run)
[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-exec.yaml

5.3.3、通过下面命令观察

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-exec 
…………
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  42s   default-scheduler  Successfully assigned liveness/liveness-exec to k8s-worker04
  Normal  Pulling    42s   kubelet            Pulling image "busybox"
  Normal  Pulled     25s   kubelet            Successfully pulled image "busybox" in 16.577767141s
  Normal  Created    25s   kubelet            Created container liveness
  Normal  Started    25s   kubelet            Started container liveness

在过几分钟观察,RESTARTS值为1

[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-exec -o wide
NAME            READY   STATUS    RESTARTS   AGE    IP               NODE           NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   1          112s   10.244.203.209   k8s-worker04   <none>           <none>

在过几分钟,观察,发现状态已经CrashLoopBackOff

[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-exec -o wide
NAME            READY   STATUS             RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
liveness-exec   0/1     CrashLoopBackOff   7          14m   10.244.203.209   k8s-worker04   <none>           <none>

详细观察

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-exec 
…………
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  17m                    default-scheduler  Successfully assigned liveness/liveness-exec to k8s-worker04
  Normal   Pulling    17m                    kubelet            Pulling image "busybox"
  Normal   Pulled     16m                    kubelet            Successfully pulled image "busybox" in 16.577767141s
  Warning  Unhealthy  13m (x9 over 16m)      kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    13m (x3 over 16m)      kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Created    13m (x4 over 16m)      kubelet            Created container liveness
  Normal   Started    13m (x4 over 16m)      kubelet            Started container liveness
  Normal   Pulled     6m31s (x7 over 15m)    kubelet            Container image "busybox" already present on machine
  Warning  BackOff    111s (x25 over 9m16s)  kubelet            Back-off restarting failed container
  #看到17m前被调度到了k8s-worker04上,13m前健康检查出问题

5.3.4、容器重启策略验证

我们把上面yaml文件中容器重启策略修改为Never,在进行观察
[root@k8s-master01 yaml]# cat pod-liveness-exec.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: liveness
---
apiVersion: v1
kind: Pod
metadata:
  name: liveness-exec
  namespace: liveness
  labels:
    zone: A
    region: wh
spec:
  restartPolicy: Never
  containers:
  - name: liveness
    image: "busybox"
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
应用并验证
[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-exec.yaml --dry-run=client
namespace/liveness created (dry run)
pod/liveness-exec created (dry run)
[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-exec.yaml
namespace/liveness created
pod/liveness-exec created
等待一会儿后观察,容器健康检查出问题后,不在重启,也不会继续sleep 600s,而是直接关闭了
[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-exec
NAME            READY   STATUS   RESTARTS   AGE
liveness-exec   0/1     Error    0          117s

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-exec 
…………
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  78s                default-scheduler  Successfully assigned liveness/liveness-exec to k8s-worker04
  Normal   Pulled     78s                kubelet            Container image "busybox" already present on machine
  Normal   Created    78s                kubelet            Created container liveness
  Normal   Started    78s                kubelet            Started container liveness
  Warning  Unhealthy  34s (x3 over 44s)  kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    34s                kubelet            Stopping container liveness

5.4、liveness-httpget案例

5.4.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-liveness-http.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: liveness
---
apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
  namespace: liveness
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: http
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    ports:                                # 指定容器端口,这一行不写也行,由镜像决定
    - name: http                        # 之定义名称,不需要与下面的port:http对应
      containerPort: 80                    # 类似dockerfile里的export 80
    livenessProbe:
      httpGet:                            # 使用httpGet方式
        port: http                        # http协议,也可以直接写80端口
        path: /index.html                # 探测家目录下的index.html
      initialDelaySeconds: 3            # 延迟3s开始探测
      periodSeconds: 5                    # 每隔5s探测一次

5.4.2、检查语法并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-http.yaml --dry-run=client
namespace/liveness configured (dry run)
pod/liveness-http created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-http.yaml
namespace/liveness unchanged
pod/liveness-http created

5.4.3、验证查看

[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-http -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
liveness-http   1/1     Running   0          46s   10.244.203.211   k8s-worker04   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-http 
…………
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  57s   default-scheduler  Successfully assigned liveness/liveness-http to k8s-worker04
  Normal  Pulled     57s   kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    57s   kubelet            Created container http
  Normal  Started    57s   kubelet            Started container http

5.4.4、交换式删除nginx的主页文件

[root@k8s-master01 yaml]# kubectl exec -it -n liveness liveness-http -- rm -rf /usr/share/nginx/html/index.html

5.4.5、再次验证查看

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-http 
…………
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  4m37s                default-scheduler  Successfully assigned liveness/liveness-http to k8s-worker04
  Normal   Pulled     38s (x2 over 4m37s)  kubelet            Container image "nginx:latest" already present on machine
  Normal   Created    38s (x2 over 4m37s)  kubelet            Created container http
  Normal   Started    38s (x2 over 4m37s)  kubelet            Started container http
  Warning  Unhealthy  38s (x3 over 48s)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    38s                  kubelet            Container http failed liveness probe, will be restarted
  #我们发现探测主页不存在,404报错,重启后pod不会报错,因为主页被初始化了。

[root@k8s-master01 yaml]# kubectl exec -it -n liveness liveness-http -- ls /usr/share/nginx/html
50x.html  index.html

5.5、liveness-tcp案例

5.5.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-liveness-tcp.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: liveness
---
apiVersion: v1
kind: Pod
metadata:
  name: liveness-tcp
  namespace: liveness
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: tcp
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    livenessProbe:
      tcpSocket:            # 使用tcp连接方式
        port: 80            # 连接80端口进行探测
      initialDelaySeconds: 3
      periodSeconds: 5

5.5.2、检查语法并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-tcp.yaml --dry-run=client
namespace/liveness configured (dry run)
pod/liveness-tcp configured (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-tcp.yaml
namespace/liveness unchanged
pod/liveness-tcp created

5.5.3、验证

[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-tcp 
NAME           READY   STATUS    RESTARTS   AGE
liveness-tcp   1/1     Running   0          104s

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-tcp 
…………
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  119s  default-scheduler  Successfully assigned liveness/liveness-tcp to k8s-worker02
  Normal  Pulled     119s  kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    119s  kubelet            Created container tcp
  Normal  Started    118s  kubelet            Started container tcp

5.5.4、交换式关闭nginx

[root@k8s-master01 yaml]# kubectl exec -it -n liveness liveness-tcp -- nginx  -s stop
2024/02/19 03:54:37 [notice] 55#55: signal process started

5.5.5、再次观察

[root@k8s-master01 yaml]# kubectl describe pod -n liveness liveness-tcp 
…………
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  6m48s               default-scheduler  Successfully assigned liveness/liveness-tcp to k8s-worker02
  Warning  BackOff    18s (x2 over 19s)   kubelet            Back-off restarting failed container
  Normal   Pulled     5s (x3 over 6m48s)  kubelet            Container image "nginx:latest" already present on machine
  Normal   Created    5s (x3 over 6m48s)  kubelet            Created container tcp
  Normal   Started    5s (x3 over 6m47s)  kubelet            Started container tcp
  
  
[root@k8s-master01 yaml]# kubectl get pod -n liveness liveness-tcp 
NAME           READY   STATUS    RESTARTS   AGE
liveness-tcp   1/1     Running   2          8m23s
# RESTARTS值为2,表明重启2次了。

5.6、readiness案例结合httpGet

检查容器,如果容器存在问题会标记为不健康不可用的状态

5.6.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-readiness-http.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: readiness
---
apiVersion: v1
kind: Pod
metadata:
  name: readiness-http
  namespace: readiness
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: http
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    readinessProbe:                            #这里由liveness换成了readiness
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 3
      periodSeconds: 5

5.6.2、检查语法并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-readiness-http.yaml --dry-run=client
namespace/readiness created (dry run)
pod/readiness-http created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-readiness-http.yaml
namespace/readiness created
pod/readiness-http created

5.6.3、验证

[root@k8s-master01 yaml]# kubectl get pod -n readiness readiness-http -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
readiness-http   1/1     Running   0          28s   10.244.203.212   k8s-worker04   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n readiness readiness-http 
…………
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  38s   default-scheduler  Successfully assigned readiness/readiness-http to k8s-worker04
  Normal  Pulled     38s   kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    38s   kubelet            Created container http
  Normal  Started    38s   kubelet            Started container http

5.6.6、交互式删除主页

[root@k8s-master01 yaml]# kubectl exec -it -n readiness readiness-http -- rm -rf /usr/share/nginx/html/index.html

5.6.7、验证

[root@k8s-master01 yaml]# kubectl describe pod -n readiness readiness-http | tail -8
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  11m               default-scheduler  Successfully assigned readiness/readiness-http to k8s-worker04
  Normal   Pulled     11m               kubelet            Container image "nginx:latest" already present on machine
  Normal   Created    11m               kubelet            Created container http
  Normal   Started    11m               kubelet            Started container http
  Warning  Unhealthy  3s (x3 over 13s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 404
  #我们发现报了一个404错误,但是并不会去重启容器,因此下面的RESTARTS值为0
  
[root@k8s-master01 yaml]# kubectl get pod -n readiness readiness-http 
NAME             READY   STATUS    RESTARTS   AGE
readiness-http   0/1     Running   0          12m

5.6.8、创建交互式主页文件

[root@k8s-master01 yaml]# kubectl exec -it -n readiness readiness-http -- touch /usr/share/nginx/html/index.html

5.6.9、再次验证

[root@k8s-master01 yaml]# kubectl get pod -n readiness readiness-http -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
readiness-http   1/1     Running   0          13m   10.244.203.212   k8s-worker04   <none>           <none>

5.7、readiness+liveness综合案例

实际工作中,推荐readiness+liveness结合方式

5.7.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-liveness-readiness-http.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: readiness-liveness
---
apiVersion: v1
kind: Pod
metadata:
  name: readiness-liveness-http
  namespace: readiness-liveness
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: http
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    ports:
    - name: http
      containerPort: 80
    livenessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 1
      periodSeconds: 3
    readinessProbe:
      httpGet:
        port: http
        path: /index.html
      initialDelaySeconds: 5
      periodSeconds: 5

5.7.2、检查语法并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-readiness-http.yaml --dry-run=client
namespace/readiness-liveness configured (dry run)
pod/readiness-liveness-http configured (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-liveness-readiness-http.yaml 
namespace/readiness-liveness created
pod/readiness-liveness-http created

5.7.3、验证

[root@k8s-master01 yaml]# kubectl get pod -n readiness-liveness readiness-liveness-http -o wide
NAME                      READY   STATUS    RESTARTS   AGE   IP             NODE           NOMINATED NODE   READINESS GATES
readiness-liveness-http   1/1     Running   0          52s   10.244.79.76   k8s-worker01   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n readiness-liveness readiness-liveness-http | tail -8
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  70s   default-scheduler  Successfully assigned readiness-liveness/readiness-liveness-http to k8s-worker01
  Normal  Pulled     70s   kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    70s   kubelet            Created container http
  Normal  Started    70s   kubelet            Started container http

6、post-start

6.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-poststart.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: poststart
---
apiVersion: v1
kind: Pod
metadata:
  name: poststart
  namespace: poststart
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: poststart
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    lifecycle:                                # 生命周期事件,启动后的触发事件,本例为创建一个目录
      postStart:
        exec:
          command: ["mkdir", "-p", "/usr/share/nginx/html/haha"]

6.2、检查语法错误并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-poststart.yaml --dry-run=client
namespace/poststart created (dry run)
pod/poststart created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-poststart.yaml
namespace/poststart created
pod/poststart created

6.3、验证

[root@k8s-master01 yaml]# kubectl get pod -n poststart poststart -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
poststart   1/1     Running   0          38s   10.244.39.206   k8s-worker03   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n poststart poststart | tail -8
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  50s   default-scheduler  Successfully assigned poststart/poststart to k8s-worker03
  Normal  Pulled     50s   kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    50s   kubelet            Created container poststart
  Normal  Started    50s   kubelet            Started container poststart
  
[root@k8s-master01 yaml]# kubectl exec -it  -n poststart poststart -- ls /usr/share/nginx/html -l
total 8
-rw-r--r-- 1 root root 497 Dec 28  2021 50x.html
drwxr-xr-x 2 root root   6 Feb 19 06:47 haha            # 有创建此目录
-rw-r--r-- 1 root root 615 Dec 28  2021 index.html

7、post-stop

7.1、编写yaml文件

[root@k8s-master01 yaml]# cat pod-poststop.yaml 
apiVersion: v1
kind: Namespace
metadata:
  name: poststop
---
apiVersion: v1
kind: Pod
metadata:
  name: poststop
  namespace: poststop
  labels:
    zone: A
    region: wh
spec:
  containers:
  - name: poststop
    image: "nginx:latest"
    imagePullPolicy: IfNotPresent
    lifecycle:                                # 生命周期事件
      preStop:                                # preStop
        exec:
          command: ["/bin/sh","-c","sleep 60000000"]            # 容器终止前sleep 60000000秒

7.2、检查语法错误并应用

[root@k8s-master01 yaml]# kubectl apply -f pod-poststop.yaml --dry-run=client
namespace/poststop created (dry run)
pod/poststop created (dry run)

[root@k8s-master01 yaml]# kubectl apply -f pod-poststop.yaml
namespace/poststop created
pod/poststop created

7.3、查看验证

[root@k8s-master01 yaml]# kubectl get pod -n poststop poststop -o wide
NAME       READY   STATUS    RESTARTS   AGE   IP              NODE           NOMINATED NODE   READINESS GATES
poststop   1/1     Running   0          68s   10.244.39.207   k8s-worker03   <none>           <none>

[root@k8s-master01 yaml]# kubectl describe pod -n poststop poststop | tail -8
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  84s   default-scheduler  Successfully assigned poststop/poststop to k8s-worker03
  Normal  Pulled     84s   kubelet            Container image "nginx:latest" already present on machine
  Normal  Created    84s   kubelet            Created container poststop
  Normal  Started    84s   kubelet            Started container poststop

7.4、删除pod验证

[root@k8s-master01 yaml]# kubectl delete -f pod-poststop.yaml 
namespace "poststop" deleted
pod "poststop" deleted                    # 会在这一步等待一定时间,大概30-60s才能删除,说明验证成功

结论:当出现特殊情况不能正常销毁pod时,大概等待30s会强制终止

十一、pod故障排除

1、pod的几种状态:

1)Pending:Pod创建已经提交给k8s,但是因为某种原因不能顺利创建,例如下载镜像慢,调度不成功等。
2)Running:Pod已经绑定到一个节点上了,并且已经创建了所有容器。只是有一个容器正在运行,或者在启动中。
3)Secceeded:Pod中的所有容器都已经成功终止,不能重新启动。
4)Failed: Pod中所有的容器均已经终止,且至少有一个容器已经在故障中终止。
5)Unkown:由于某中原因apiserver无法获取到Pod的状态。通常是由于Master与pod所在的主机失去连接了。
6)CrashLoopBackOff:多见于CMD语句错误或找不到container入口语句导致了快速退出,可以用kubectl logs查看日志进行排错。

2、常用的排障命令:

kubectl get pod <pod-name> -o yaml 查看 Pod 的配置是否正确
kubectl describe pod <pod-name> 查看 Pod 的事件
kubectl logs <pod-name> [-c <container-name>] 查看容器日志
kubectl exec -it <pod-name>  -- /bin/bash #进入容器查看

3、常见故障归类

Pod状态 一直处于 Pending
Pod状态 一直处于 Waiting
Pod状态 一直处于 ContainerCreating
Pod状态 处于 ImagePullBackOff
Pod状态 处于 CrashLoopBackOff
Pod状态 处于 Error
Pod状态 一直处于 Terminating
Pod状态 处于 Unknown

4、故障原因分析:

Pod --Pending状态

Pending 说明 Pod 还没有调度到某个 Node 上面。可以通过 
kubectl describe pod <pod-name> 命令查看到当前 Pod 的事件,进而判断为什么没有调度。可能的原因包括

资源不足,集群内所有的 Node 都不满足该 Pod 请求的 CPU、内存、GPU 等资源
HostPort 已被占用,通常推荐使用 Service 对外开放服务端口

Pod --Waiting 或 ContainerCreating状态

首先还是通过 kubectl describe pod <pod-name> 命令查看到当前 Pod 的事件。可能的原因包括

镜像拉取失败,比如配置了镜像错误、Kubelet 无法访问镜像、私有镜像的密钥配置错误、镜像太大,拉取超时等

CNI 网络错误,一般需要检查 CNI 网络插件的配置,比如无法配置 Pod 、无法分配 IP 地址

容器无法启动,需要检查是否打包了正确的镜像或者是否配置了正确的容器参数

Pod – ImagePullBackOff状态

这也是我们测试环境常见的,通常是镜像拉取失败。这种情况可以使用 docker pull <image> 来验证镜像是否可以正常拉取。

或者docker images | grep <images>查看镜像是否存在(系统有时会因为资源问题自动删除一部分镜像)

Pod – CrashLoopBackOff状态
CrashLoopBackOff 状态说明容器曾经启动了,但可能又异常退出了。此时可以先查看一下容器的日志
kubectl logs kubectl logs --previous
这里可以发现一些容器退出的原因,比如

容器进程退出
健康检查失败退出

Pod --Error 状态
通常处于 Error 状态说明 Pod 启动过程中发生了错误。常见的原因包括

依赖的 ConfigMap、Secret 或者 PV 等不存在
请求的资源超过了管理员设置的限制,比如超过了 LimitRange 等
违反集群的安全策略,比如违反了 PodSecurityPolicy 等
容器无权操作集群内的资源,比如开启 RBAC 后,需要为 ServiceAccount 配置角色绑定

Pod --Terminating 或 Unknown 状态
从 v1.5 开始,Kubernetes 不会因为 Node 失联而删除其上正在运行的 Pod,而是将其标记为 Terminating 或 Unknown 状态。想要删除这些状态的 Pod 有三种方法:

从集群中删除该 Node。使用公有云时,kube-controller-manager 会在 VM 删除后自动删除对应的 Node。而在物理机部署的集群中,需要管理员手动删除 Node(如 kubectl delete node <node-name>。
Node 恢复正常。Kubelet 会重新跟 kube-apiserver 通信确认这些 Pod 的期待状态,进而再决定删除或者继续运行这些 Pod。
用户强制删除。用户可以执行 kubectl delete pods <pod> --grace-period=0 --force 强制删除 Pod。除非明确知道 Pod 的确处于停止状态(比如 Node 所在 VM 或物理机已经关机),否则不建议使用该方法。
特别是 StatefulSet 管理的 Pod,强制删除容易导致脑裂或者数据丢失等问题。

Pod – Evicted状态

出现这种情况,多见于系统内存或硬盘资源不足,可df-h查看docker存储所在目录的资源使用情况,如果百分比大于85%,就要及时清理下资源,尤其是一些大文件、docker镜像。

清除状态为Evicted的pod:

kubectl get pods | grep Evicted | awk '{print $1}' | xargs kubectl delete pod

删除所有状态异常的pod:

kubectl delete pods $(kubectl get pods | grep -v Running | cut -d ' ' -f 1)

删除集群中没有在使用的docker镜像(慎用):

docker system prune -a

查看pod对应的服务(镜像)版本:

kubectl --server=127.0.0.1:8888 get rc -o yaml | grep image: |uniq | sort | grep ecs-core




附:

删除某类历史镜像(仅保留当前使用的)

docker images | grep ecs-core | grep -v `docker images | grep ecs-core -m 1 | awk '{print $2}'` | awk '{print $3
0
k8s
广告 广告

评论区