k8s-pod调度

Pod调度-原始（pod上定义的属性）

亲和性，nodeSelector，污点都是经过调度器。如果调度器出现故障，希望pod可以快速部署到某个节点，可以使用nodeName

Pod.spec.nodeSelector是经过Kubernetes的label-selector机制选择节点，由调度器调度策略匹配label，然后调度Pod到目标节点，该匹配规则属于【强制】约束。因为是调度器调度，所以不能越过Taints污点进行调度。
pod.spec.nodeName将Pod直接调度到指定的Node节点上，会【跳过Scheduler的调度策略】，该匹配规则是【强制】匹配。能够越过Taints污点进行调度。
- nodeName用于选择节点的一些限制是：
  - 若是指定的节点不存在，则容器将不会运行，而且在某些状况下可能会自动删除。
  - 若是指定的节点没有足够的资源来容纳该Pod，则该Pod将会失败，而且其缘由将被指出，例如OutOfmemory或OutOfcpu。
  - 云环境中的节点名称并不是总是可预测或稳定的。

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: docker.io/nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    disktype: ssd

Pod调度-亲和性（pod上定义的属性）

亲和性主要分为两类：nodeAffinity 和 podAffinity/podAntiAffinity

支持两种形式：

requiredDuringSchedulingIgnoredDuringExecution ：硬限制，同nodeSelector
preferredDuringSchedulingIgnoredDuringExecution ：软策略

IgnoreDuringExecution表示如果在Pod运行期间Node的标签发生变化，导致亲和性策略不能满足，则继续运行当前的Pod

operator 提供如下几种操作：

In：label 的值在某个列表中
NotIn：label 的值不在某个列表中
Gt：label 的值大于某个值（nodeAffinity）
Lt：label 的值小于某个值（nodeAffinity）
Exists：某个 label 存在
DoesNotExist：某个 label 不存在

nodeAffinity

节点亲和性概念上类似于 nodeSelector，但它的表达能力更强，并且允许指定软规则。它使你可以根据节点上的标签来约束 Pod 可以调度到哪些节点上。

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1               //取值范围1-100
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: nginx
    image: docker.io/nginx

该 Pod 只能被调度到拥有 kubernetes.io/e2e-az-name=e2e-az1 或者 kubernetes.io/e2e-az-name=e2e-az2 标签的节点上。
其中在满足之前标签条件的同时更倾向于调度在拥有 another-node-label-key=another-node-label-value 标签的节点上

1）如果同时指定 nodeSelector 和 nodeAffinity ，两者同时满足才会被调度。

2）如果指定多个nodeSelectorTerms，则只要满足其中一个条件，就会被调度到相应的节点上。

3）如果指定多个matchExpressions，则所有的条件都必须满足，才会调度到对应的节点。

podAffinity

Pod 间亲和性与反亲和性使你可以基于已经在节点上运行的 Pod 的标签来约束 Pod 可以调度到的节点，而不是基于节点上的标签

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: kubernetes.io/hostname
  containers:
  - name: with-pod-affinity
    image: k8s.gcr.io/pause:2.0

pod 可以运行在节点 N 上，如果该节点有标签 key 为 failure-domain.beta.kubernetes.io/zone ，而且运行着标签为 security=S1 的实例。
如果这个节点拥有标签 key 为 failure-domain.beta.kubernetes.io/zone ，但运行有 security=S2 标签的 pod，那么这个节点就不会被优先选择调度

污点/容忍度

节点亲和性是 Pod 的一种属性，它使 Pod 被吸引到一类特定的节点（这可能出于一种偏好，也可能是硬性要求）。 污点（Taint） 则相反——它使节点能够排斥一类特定的 Pod。
容忍度（Toleration） 是应用于 Pod 上的。容忍度允许调度器调度带有对应污点的 Pod。容忍度允许调度但并不保证调度：作为其功能的一部分，调度器也会评估其他参数。
污点是给node节点设置的，容忍度是给pod设置的

给Node增加污点

kubectl taint nodes node1 key1=value1:NoSchedule

taint eﬀect 支持如下三个选项：

NoSchedule：表示k8s将不会将Pod调度到具有该污点的Node上
PreferNoSchedule：表示k8s将尽量避免将Pod调度到具有该污点的Node上
NoExecute：表示k8s将不会将Pod调度到具有该污点的Node上，同时会将Node上已经存在的Pod驱逐出去

给Pod设置容忍度

Pod 拥有其中的任何一个容忍度，都能够被调度到 node1：

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"

tolerations:
- key: "key1"
  operator: "Exists"
  effect: "NoSchedule"

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "key1"
    operator: "Exists"
    effect: "NoSchedule"

如果 operator 是 Exists （此时容忍度不能指定 value），或者
如果 operator 是 Equal ，则它们的 value 应该相等

存在两种特殊情况：
如果一个容忍度的 key 为空且 operator 为 Exists，表示这个容忍度与任意的 key、value 和 effect 都匹配，即这个容忍度能容忍任何污点。
如果 effect 为空，则可以与所有键名 key1 的效果相匹配。

Pod调度-原始（pod上定义的属性）#

Pod调度-亲和性（pod上定义的属性）#

nodeAffinity#

podAffinity#

污点/容忍度#

给Node增加污点#

给Pod设置容忍度#