一、亲和力配置详解

1.1 节点亲和力配置详解

1.yaml文件展示

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - az-2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
          operator: In
          values:
          - another-node-label-value
  containers:
  - name: with-node-affinity

上述配置的Pod只能部署在具有label的key为kubernetes.io/e2e-az-name、value为e2e-az1或az-2的节点上,但是因为配置了软亲和力,所以在满足上述条件时,会尽量部署在具有another-node-label-key= another-node-label-value的节点上。当然这个条件不是强制要求的,没有该标签的Node不会影响部署。

文件参数说明如下: (1)requiredDuringSchedulingIgnoredDuringExecution:硬亲和力配置

  • nodeSelectorTerms:节点选择器配置,可以配置多个matchExpressions(满足其一),每个 matchExpressions下可以配置多个key、value类型的选择器(都需要满足),其中values可以配置多个 (满足其一)

(2)preferredDuringSchedulingIgnoredDuringExecution:软亲和力配置

  • weight:软亲和力的权重,权重越高优先级越大,范围1-100

  • preference:软亲和力配置项,和weight同级,可以配置多个,matchExpressions和硬亲和力一致

(3)operator:标签匹配的方式

  • In:相当于key = value的形式

  • NotIn:相当于key != value的形式

  • Exists:节点存在label的key为指定的值即可,不能配置values字段

  • DoesNotExist:节点不存在label的key为指定的值即可,不能配置values字段

  • Gt:大于value指定的值

  • Lt:小于value指定的值

注意:如果同时指定了NodeSelector和NodeAffinity,需要两者都满足才能被调度;如果配置了多个NodeSelectorTerms,满足其一即可调度到指定的节点上;如果配置了多个MatchExpressions,需要全部满足才能调度到指定的节点上;如果删除了被调度节点上的标签,Pod不会被删除,也就是说亲和力配置只有在调度的时候才会起作用。

1.2 Pod亲和力和反亲和力配置详解

NodeAffinity是根据节点上的标签选择性地调度,可以让Pod部署在指定标签的节点上,或者不部署在指定标签的节点上,调度时是根据节点上的标签进行选择的。而Pod亲和力和反亲和力是根据其他Pod的标签进行匹配的,比如想要A服务的Pod不能和具有service=b标签的Pod部署在同一个节点上,此时可以使用Pod亲和力和反亲和力进行配置,该调度是基于Pod的标签进行选择的。

1.yaml文件展示

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
        matchExpressions:
        - key: security
          operator: In
          values:
          - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          namespaces:
          - default
          topologyKey: failure-domain.beta.kubernetes.io/zone
  containers:
  - name: with-pod-affinity
  image: nginx

文件参数说明如下:

  • labelSelector:Pod选择器配置,可以配置多个
  • matchExpressions:和节点亲和力配置一致
  • operator:配置和节点亲和力一致,但是没有Gt和Lt
  • topologyKey:匹配的拓扑域的key,也就是节点上label的key,key和 value相同的为同一个域,可以用于标注不同的机房和地区
  • Namespaces: 和哪个命名空间的Pod进行匹配,为空为当前命名空间

上述Pod亲和力配置的是硬亲和力,也就是必须和具有security为S1标签的Pod部署在同一个域内,域的名称为failure-domain.beta.kubernetes.io/zone。但同时又配置了Pod反亲和力,尽量不和具有security为S2标签的Pod部署在同一个域内,所以该Pod要求在failure-domain.beta.kubernetes.io/zone的域内,要和security为S1部署在一起,尽量不和security为S2部署在一起。

由于Pod的标签是Pod本身的配置,因此它和Pod一样具有隔离性,也就是说在配置Pod亲和力和反亲和力时可以匹配其他Namespace下的Pod的标签。

注意:由于在使用Pod亲和力和Pod反亲和力时,需要进行大量的计算,会降低大规模集群下的调度速率,因此在集群节点超过数百时,并不建议使用过多的Pod亲和力配置。

二、拓扑域TopologyKey详解

拓扑域,主要针对宿主机,相当于对宿主机进行区域的划分。用label进行判断,不同的key和不同的value是属于不同的拓扑域.

下面演示同一个应用多区域部署:

1.根据标签区分区域,其中k8s-master01和k8s-master02为大兴区,k8s-master03和k8s-node01为朝阳区,k8s-node02为XX区

[root@k8s-master01 study]# kubectl label node k8s-master01 k8s-master02 region=daxing
[root@k8s-master01 study]# kubectl label node k8s-master03 k8s-node01 region=chaoyang
[root@k8s-master01 study]# kubectl label node k8s-node02 region=xx

2.查看标签是否已经打上

[root@k8s-master01 study]# kubectl get node -lregion --show-labels
NAME           STATUS   ROLES                  AGE   VERSION    LABELS
k8s-master01   Ready    control-plane,master   8d    v1.23.14   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gpu=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,region=daxing,ssd=true
k8s-master02   Ready    control-plane,master   8d    v1.23.14   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master02,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,region=daxing
k8s-master03   Ready    control-plane,master   8d    v1.23.14   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master03,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=,region=chaoyang
k8s-node01     Ready    <none>                 8d    v1.23.14   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,region=chaoyang
k8s-node02     Ready    <none>                 8d    v1.23.14   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux,region=xx,type=physical

3.定义一个yaml文件

[root@k8s-master01 study]# vim podAntiAffinity03.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: must-be-diff-zone
  name: must-be-diff-zone
  namespace: kube-public
spec:
  replicas: 3
  selector:
    matchLabels:
      app: must-be-diff-zone
  template:
    metadata:
      labels:
        app: must-be-diff-zone
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - must-be-diff-zone
            topologyKey: region
      containers:
      - image: registry.cn-hangzhou.aliyuncs.com/zq-demo/nginx:1.14.2
        imagePullPolicy: IfNotPresent
        name: must-be-diff-zone

4.开始部署

[root@k8s-master01 study]# kubectl create -f podAntiAffinity03.yaml

5.查看pod节点状态

[root@k8s-master01 study]# kubectl get po -n kube-public -owide
NAME                                 READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
must-be-diff-zone-79dfd48799-2nlz5   1/1     Running   0          44s   172.17.125.4    k8s-node01   <none>           <none>
must-be-diff-zone-79dfd48799-bbkk2   0/1     Pending   0          44s   <none>          <none>       <none>           <none>
must-be-diff-zone-79dfd48799-qf4vg   1/1     Running   0          44s   172.27.14.199   k8s-node02   <none>           <none>

6.查看污点

[root@k8s-master01 study]# kubectl describe node | grep Taint
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             node-role.kubernetes.io/master:NoSchedule
Taints:             <none>
Taints:             <none>

7.删除污点

[root@k8s-master01 study]# kubectl  taint node  -l node-role.kubernetes.io/master node-role.kubernetes.io/master:NoSchedule-
node/k8s-master01 untainted
node/k8s-master02 untainted
node/k8s-master03 untainted

[root@k8s-master01 study]# kubectl describe node | grep Taint Taints:             <none>
Taints:             <none>
Taints:             <none>
Taints:             <none>
Taints:             <none>

8.再次查看pod节点状态,观察到同一个应用多区域部署

[root@k8s-master01 study]# kubectl get po -n kube-public -owideNAME                                 READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
must-be-diff-zone-79dfd48799-2nlz5   1/1     Running   0          37m   172.17.125.4     k8s-node01     <none>           <none>
must-be-diff-zone-79dfd48799-bbkk2   1/1     Running   0          37m   172.25.244.193   k8s-master01   <none>           <none>
must-be-diff-zone-79dfd48799-qf4vg   1/1     Running   0          37m   172.27.14.199    k8s-node02     <none>           <none>