大致步骤

  1. 生成etcd证书

cert-filekey-file用于etcd服务器与客户端之间的通信,而peer-cert-filepeer-key-file用于集群内部etcd节点间的相互通信。

只要证书是由信任的 CA 签发,并且etcd服务器配置为信任该CA(通过--trusted-ca-file参数指定),那么任何持有由该CA签发的有效客户端证书的客户端都可以与etcd服务器建立安全通信。

  1. 恢复etcd数据库
  2. 启动etcd集群
  3. 其他组件恢复

生成etcd证书

  1. 编写证书请求文件csr.conf
[ req ]
default_bits = 2048
prompt = no
default_md = sha256
distinguished_name = dn
req_extensions = req_ext

[ dn ]
CN = etcd-cluster

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = etcd-cluster
DNS.2 = k8s-master-01
DNS.3 = k8s-master-02
DNS.4 = k8s-master-03
DNS.5 = localhost
IP.1 = 192.168.99.7
IP.2 = 192.168.99.8
IP.3 = 192.168.99.9
IP.4 = 127.0.0.1
  1. 生成一个新的CSR(包括生成一对新的密钥,即公钥和私钥),并且不加密私钥,输出CSR到指定文件,并根据提供的配置文件生成。

openssl req -new -nodes -text -out etcd-cluster.csr -config csr.conf -keyout etcd-cluster.key

  1. 查看证书签名请求(CSR)的详细信息

openssl req -in etcd-cluster.csr -noout -text

  1. 使用CA的证书和私钥对CSR进行签名,生成一个新的证书etcd-cluster.crt,有效期为2000天,包含csr.conf中指定的扩展。

openssl x509 -req -in etcd-cluster.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out etcd-cluster.crt -days 2000 -extensions req_ext -extfile csr.conf


恢复etcd数据库

  • 清空3个节点/var/lib/etcd/目录,分别在3个节点恢复数据库,以master02.db的快照为准。

  • <new_cluster_token>是新集群的唯一标识符,任何字符串都可以,只要它在集群中是唯一的。

  • <new_cluster_token>仅在集群初始化或从快照恢复期间使用,之后对集群的运行没有影响。

etcdctl snapshot restore /root/master02.db\
  --name k8s-master-01  \
  --initial-cluster k8s-master-01=https://192.168.99.7:2380,k8s-master-02=https://192.168.99.8:2380,k8s-master-03=https://192.168.99.9:2380 \
  --initial-cluster-token new-cluster-20240329 \
  --initial-advertise-peer-urls https://192.168.99.7:2380 \
  --data-dir /var/lib/etcd/


etcdctl snapshot restore /root/master02.db\
  --name k8s-master-02  \
  --initial-cluster k8s-master-01=https://192.168.99.7:2380,k8s-master-02=https://192.168.99.8:2380,k8s-master-03=https://192.168.99.9:2380 \
  --initial-cluster-token new-cluster-20240329 \
  --initial-advertise-peer-urls https://192.168.99.8:2380 \
  --data-dir /var/lib/etcd/

etcdctl snapshot restore /root/master02.db\
  --name k8s-master-03  \
  --initial-cluster k8s-master-01=https://192.168.99.7:2380,k8s-master-02=https://192.168.99.8:2380,k8s-master-03=https://192.168.99.9:2380 \
  --initial-cluster-token new-cluster-20240329 \
  --initial-advertise-peer-urls https://192.168.99.9:2380 \
  --data-dir /var/lib/etcd/

启动etcd集群

如果你在三个节点上分别使用etcdctl snapshot restore命令恢复了etcd数据,并且在恢复时为每个节点指定了其在集群中的角色和配置(通过--name--initial-cluster参数),实际上你已经预配置了整个集群的成员信息。在这种情况下,理论上,你不需要使用etcdctl member add命令来手动添加成员到集群,因为每个节点的集群成员信息已经通过恢复过程被配置好了。

编辑每个节点etcd.yaml文件,编辑后挪动至/etc/kubernetes/manifests

更新以下参数:

--name

--initial-cluster

--cert-file

--key-file

--peer-cert-file

--peer-key-file

--peer-trusted-ca-file

--trusted-ca-file

  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.99.7:2379
    - --cert-file=/etc/kubernetes/pki/etcd/etcd-cluster.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.99.7:2380
    - --initial-cluster=k8s-master-01=https://192.168.99.7:2380,k8s-master-02=https://192.168.99.8:2380,k8s-master-03=https://192.168.99.9:2380
    - --key-file=/etc/kubernetes/pki/etcd/etcd-cluster.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.99.7:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.99.7:2380
    - --name=k8s-master-01
    - --peer-cert-file=/etc/kubernetes/pki/etcd/etcd-cluster.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/etcd-cluster.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --quota-backend-bytes=6294967296
  • 检查etcd集群状态

    etcdctl --endpoints=192.168.99.7:2379,192.168.99.8:2379,192.168.99.9:2379 --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/etcd-cluster.crt" --key="/etc/kubernetes/pki/etcd/etcd-cluster.key" endpoint status -w table


其他组件恢复

kubeadm certs renew all更新相关组件证书。

/etc/kubernetes/manifests挪出下列文件,如需要则更新相关证书配置,更新完后挪回至原处

kube-apiserver.yaml

kube-controller-manager.yaml

kube-scheduler.yaml