likes
comments
collection
share

基于k8s的Redis哨兵高可用部署方案

作者站长头像
站长
· 阅读数 1

Redis是目前业界事实标准的缓存中间件。我们在实际项目中也采用了Redis对登录令牌、用户信息和业务数据等做了缓存层来应对大流量高并发的访问请求,来减轻数据库压力提高系统可用性,同时降低请求延迟提升用户体验。虽然Redis本身性能非常优异,在开发和测试环境使用单机版Redis即可满足需求,但是生产环境还是建议采用集群方式进行部署,可以避免单点故障进一步提升系统整体的可用性。我们考虑过Redis的集群(cluster)和哨兵(sentinel)两种部署模式,从下表中的四个维度进行了对比:

模式可用性扩展性复杂度成本
Redis Cluster需结合Sentinel
Redis Sentinel仅读操作可以扩展

Redis集群相比哨兵模式的优势主要是具备数据分片能力,扩展性高,但结合我们的场景暂时还不需要考虑扩展性。Redis集群想实现高可用需要结合哨兵模式才可以,也就是集群中每个节点都要按哨兵模式来部署,slave实时复制master节点的数据,架构复杂度和部署运维成本高。综合考虑各种因素,我们选择了哨兵模式。

另一个问题就是基于主机部署还是基于k8s部署?其实不论是主机还是k8s都可以实现目标,主要还是看团队和项目的实际情况。如果团队里有专业的运维岗位或者有对shell和ansible脚本非常熟悉的同事,选择主机部署就可以。如果团队主推的是devops的研发模式,并且已经用上了高可用k8s集群,那么选择k8s部署就是最优选择。我们的情况属于后者,核心系统全部已经容器化,基于k8s来做CICD。所以我们最终采用了基于k8s的Redis哨兵高可用部署方案。

什么是哨兵和哨兵模式?

哨兵是可以用redis-sentinel /etc/redis/sentinel.conf命令直接启动的一个进程。redis-sentinel可执行文件是跟redis一起安装的。哨兵模式是一种实现redis高可用的部署模式,即部署一组哨兵构成一套分布式系统,实现redis主从集群的自动故障转移。具体来说,哨兵可以提供如下四种能力:

  • 监控能力:哨兵可以持续监控redis主从节点是否正常工作;
  • 通知能力:如果有redis实例出现什么问题,哨兵可以通过API通知到系统管理员或者其他程序;
  • 自动故障转移:如果主节点出现故障,哨兵可以自动启动故障转移机制,提升一台redis从节点为新的主节点,使redis可以自动从故障中恢复继续提供服务;
  • 配置中心:哨兵承担着配置中心的职责,客户端可以连接哨兵并查询到redis主节点的信息,如果发生故障转移,还可以收到新的主节点的信息;

自动故障转移的过程是这样的:每一台哨兵节点都会通过向redis主节点发送PING心跳命令,如果发现主节点失联,则会通过gossip的机制,向其他哨兵节点确认主节点是否失联,如果拿到quorum份确认信息(算上自己),则确定该主节点出现故障,发起类似raft的选举,当收到大多数哨兵的投票,则可以开始执行故障转移。首先按一定规则从全部从节点中选出最合适的一个,提升为主节点,然后更新其他从节点的配置,使得它们从新的主节点同步数据,最后通知客户端发生了故障转移并给到新的主节点的地址。

部署环境

本方案在高可用多架构k8s集群环境验证可行,以下是详情参数:

主机名配置角色k8s版本操作系统kernel版本容器运行时
master116C32Gectd,masterv1.18.12CentOS73.10.0 x86_64docker://19.3.9
master216C32Gectd,masterv1.18.12CentOS73.10.0 x86_64docker://19.3.9
master316C32Gectd,masterv1.18.12CentOS73.10.0 x86_64docker://19.3.9
node116C32Gworkerv1.18.12CentOS73.10.0 x86_64docker://19.3.9
node216C32Gworkerv1.18.12CentOS73.10.0 x86_64docker://19.3.9
node316C32Gworkerv1.18.12Kylin V104.19.90 aarch64docker://19.3.9

架构说明

下图为生产环境架构:

基于k8s的Redis哨兵高可用部署方案

下图为开发环境架构,比上图增加了负载均衡器Metallb,用于暴露Sentinel和Redis服务给开发者本地环境,方便开发和自测工作:

基于k8s的Redis哨兵高可用部署方案

k8s内部应用访问Redis直连k8s内部的Sentinel服务地址(图中绿底地址)即可。开发环境需要连接可以从k8s外部访问的ip:port地址(图中蓝底地址)。

Sentinel和redis实例都采用statefulset方式部署,在这里的好处主要是每个pod都可以有一个固定的访问地址,比如3个sentinel pod的访问地址分别是sentinel-0.sentinel,sentinel-1.sentinel,sentinel-2.sentinel,3个redis pod的访问地址分别是redis-0.redis,redis-1.redis,redis-2.redis。地址规则可以参考 kubernetes.io/docs/concep… 。3个redis实例为一主两从。我们还给所有的sentinel和redis实例配置了podAntiAffinity,确保每个pod都部署在不同的worker节点上,实现真正的高可用。

方案实施

完整方案包括以下配置和脚本文件,下文逐个说明:

.
├── resources
│   ├── redis-config.yml             # redis初始配置文件
│   ├── redis-scripts-config.yml     # redis和sentinel初始化脚本
│   ├── redis-secret.yml             # redis和sentinel的密码
│   ├── redis-services-dev-0.yml     # 开发环境使用的service
│   ├── redis-services-dev-1.yml     # 开发环境使用的service
│   ├── redis-services.yml           # 测试和正式环境使用的service
│   ├── redis-stateful.yml           # redis有状态pod部署文件
│   └── sentinel-stateful.yml        # sentinel有状态pod部署文件
├── startCluster-dev.sh              # 开发环境使用的部署脚本
└── startCluster.sh                  # 测试和生产环境使用的部署脚本

redis-config.yml文件用于redis实例的部署,REDIS_NODES需定义redis实例的地址,sentinel会通过这里配置的地址来与各redis实例通信,多个地址用英文逗号分隔。下文我们还会提到,redis.conf里的配置项可以按需修改。

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config-map
data:
  REDIS_NODES: "redis-0.redis,redis-1.redis,redis-2.redis"

  redis.conf: |
    bind 0.0.0.0
    protected-mode no
    port 6379
    tcp-backlog 511
    timeout 10
    tcp-keepalive 30
    daemonize no
    supervised no
    pidfile "/var/run/redis_6379.pid"
    loglevel notice
    logfile ""
    databases 16
    always-show-logo yes
    save ""
    stop-writes-on-bgsave-error yes
    rdbcompression yes
    rdbchecksum yes
    rdb-del-sync-files no
    dir "/data"
    replica-serve-stale-data yes
    replica-read-only yes
    repl-diskless-sync no
    repl-diskless-sync-delay 5
    repl-diskless-load disabled
    appendonly no
    repl-disable-tcp-nodelay no
    replica-priority 100
    acllog-max-len 128

redis-scripts-config.yml文件用于sentinel实例和redis实例的初始化。在statefulset文件里配置了initContainers参数,pod创建过程中会先启动init container来执行这个文件中配置的脚本,然后退出init container,启动真正需要的container。redis-scripts-config.yml文件中配置了两段脚本sentinel_init.sh和redis_init.sh。

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-scripts-config-map
data:  

  sentinel_init.sh: |
    #!/bin/bash

    for i in ${REDIS_NODES//,/ }
    do
        echo "finding master at $i"
        MASTER=$(redis-cli --no-auth-warning --raw -h $i -a ${REDIS_PASSWORD} info replication | awk '{print $1}' | grep master_host: | cut -d ":" -f2)
        
        if [ "${MASTER}" == "" ]; then
            echo "no master found"
            MASTER=
        else
            echo "found ${MASTER}"
            break
        fi
        
    done

    echo "sentinel monitor mymaster ${MASTER} 6379 2" >> /tmp/master
    echo "port 5000
    sentinel resolve-hostnames yes
    sentinel announce-hostnames yes
    $(cat /tmp/master)
    sentinel down-after-milliseconds mymaster 1000
    sentinel failover-timeout mymaster 10000
    sentinel parallel-syncs mymaster 1
    sentinel sentinel-pass ${REDIS_PASSWORD}
    sentinel auth-pass mymaster ${REDIS_PASSWORD}
    requirepass ${REDIS_PASSWORD}
    sentinel announce-ip ${HOSTNAME}.sentinel
    sentinel announce-port 5000

    " > /etc/redis/sentinel.conf
    cat /etc/redis/sentinel.conf


  redis_init.sh: |
    #!/bin/bash

    cp /tmp/redis/redis.conf /etc/redis/redis.conf
    echo "requirepass ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
    echo "masterauth ${REDIS_PASSWORD}" >> /etc/redis/redis.conf
    echo "replica-announce-ip ${HOSTNAME}.redis" >> /etc/redis/redis.conf
    echo "replica-announce-port 6379 " >> /etc/redis/redis.conf
    
    echo "finding master..."

    if [ "$(timeout 5 redis-cli -h sentinel -p 5000 -a ${REDIS_PASSWORD} ping)" != "PONG" ]; then

      echo "sentinel not found, defaulting to redis-0"

      if [ ${HOSTNAME} == "redis-0" ]; then
        echo "this is redis-0, not updating config..."
      else
        echo "updating redis.conf..."
        echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
        echo "slave-read-only no" >> /etc/redis/redis.conf
        echo "slaveof redis-0.redis 6379" >> /etc/redis/redis.conf
      fi

    else

      echo "sentinel found, finding master"
      MASTER="$(redis-cli -h sentinel -p 5000 -a ${REDIS_PASSWORD} sentinel get-master-addr-by-name mymaster | grep -E '(^redis-*)|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"

      if [ "${HOSTNAME}.redis" == ${MASTER} ]; then
        echo "this is master, not updating config..."
      else
        echo "master found : ${MASTER}, updating redis.conf"
        echo "slave-read-only no" >> /etc/redis/redis.conf
        echo "slaveof ${MASTER} 6379" >> /etc/redis/redis.conf
        echo "repl-ping-replica-period 3" >> /etc/redis/redis.conf
      fi

    fi

sentinel_init.sh脚本的逻辑是先遍历所有的redis实例,执行INFO命令查找master节点,然后生成/etc/redis/sentinel.conf配置文件用于启动sentinel容器。

redis_init.sh脚本的逻辑是先拷贝redis-config.yml文件里的redis配置到/etc/redis/redis.conf文件,然后配置redis密码、ip和地址等信息,再向sentinel服务发送PING命令确认sentinel是否存在,如果不存在,则默认将自己配置为redis-0实例的slave节点,如果存在,则向sentinel服务发送SENTINEL GET-MASTER-ADDR-BY-NAME命令查询当前master节点信息并将自己配置为该master节点的slave节点。

redis-secret.yml文件用于配置sentinel和redis实例的密码。

kind: Secret
apiVersion: v1
metadata:
  name: redis-secret
type: Opaque
stringData:
  REDIS_PASSWORD: "xxxxxx"

redis-services-dev-0.ymlredis-services-dev-1.ymlredis-services.yml这三个文件都是用于配置sentinel和redis服务的。包含dev字符的前两个文件用于开发环境,第三个文件用于生产环境。区别是生产环境的sentinel服务为ClusterIP类型,redis服务为Headless类型,不对集群外暴露服务,仅供集群内部的pod实例访问;开发环境的sentinel服务为LoadBalancer类型,redis服务配置了四个,一个为Headless类型,跟生产环境的相同,另外三个均为LoadBalancer类型且分别定向到三个redis实例,这样就确保了开发者本地客户端可以直连sentinel服务和所有redis实例。客户端的工作原理是先从sentinel服务查询master节点和slave节点的地址,然后客户端直连master节点(读写)或slave节点(只读)执行数据操作,且master节点会通过sentinel做failover,三个redis实例都有可能升为master节点,所以我们需要将所有redis实例也暴露到集群外。

redis-services.yml

# Headless service so sentinel could access redisses using syntax <pod-name>.<service-name>

apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
    app.kubernetes.io/component: redis
    app.kubernetes.io/instance: redis
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
    name: redis
  selector:
    app: redis

---

# Sentinel service used for project pod connection

apiVersion: v1
kind: Service
metadata:
  name: sentinel
  labels:
    app: sentinel
    app.kubernetes.io/component: sentinel
    app.kubernetes.io/instance: sentinel
spec:
  type: ClusterIP
  sessionAffinity: None
  ports:
  - port: 5000
    targetPort: 5000
    name: sentinel
  selector:
    app: sentinel

redis-services-dev-0.yml

# Headless service so sentinel could access redisses using syntax <pod-name>.<service-name>

apiVersion: v1
kind: Service
metadata:
  name: redis
  labels:
    app: redis
    app.kubernetes.io/component: redis
    app.kubernetes.io/instance: redis
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
    name: redis
  selector:
    app: redis

---
# Sentinel service used for project pod connection

apiVersion: v1
kind: Service
metadata:
  name: sentinel
  labels:
    app: sentinel
    app.kubernetes.io/component: sentinel
    app.kubernetes.io/instance: sentinel
  annotations:
    metallb.universe.tf/address-pool: pool-01
spec:
  type: LoadBalancer
  sessionAffinity: None
  ports:
  - port: 5000
    targetPort: 5000
    name: sentinel
  selector:
    app: sentinel

redis-services-dev-1.yml

apiVersion: v1
kind: Service
metadata:
  name: redis-0
  labels:
    app: redis-0
    app.kubernetes.io/component: redis-0
    app.kubernetes.io/instance: redis-0
  annotations:
    metallb.universe.tf/address-pool: pool-01
spec:
  type: LoadBalancer
  ports:
  - port: 6379
    targetPort: 6379
    name: redis-0
  selector:
    statefulset.kubernetes.io/pod-name: redis-0

---

apiVersion: v1
kind: Service
metadata:
  name: redis-1
  labels:
    app: redis-1
    app.kubernetes.io/component: redis-1
    app.kubernetes.io/instance: redis-1
  annotations:
    metallb.universe.tf/address-pool: pool-01
spec:
  type: LoadBalancer
  ports:
  - port: 6379
    targetPort: 6379
    name: redis-1
  selector:
    statefulset.kubernetes.io/pod-name: redis-1

---

apiVersion: v1
kind: Service
metadata:
  name: redis-2
  labels:
    app: redis-2
    app.kubernetes.io/component: redis-2
    app.kubernetes.io/instance: redis-2
  annotations:
    metallb.universe.tf/address-pool: pool-01
spec:
  type: LoadBalancer
  ports:
  - port: 6379
    targetPort: 6379
    name: redis-2
  selector:
    statefulset.kubernetes.io/pod-name: redis-2

这里说明一下redis-services-dev-1.yml文件里的selector属性的statefulset.kubernetes.io/pod-name: redis-x标签是由k8s自动为我们打上的,不需要我们手动打标签。

redis-stateful.yml文件用于部署redis实例,配置了3副本和podAntiAffinity。注意副本数不能多于worker节点数,每个worker节点只能有一个redis pod,多出来的会一直pending;serviceName属性值必须与redis服务名保持一致,才能有redis-x.redis这样的地址。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: redis
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      initContainers:
      - name: config
        image: redis:7.2.4
        env:
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: redis-secret
                key: REDIS_PASSWORD
        command: [ "sh", "-c", "/scripts/redis_init.sh" ]
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: config
          mountPath: /tmp/redis/
        - name: init-script
          mountPath: /scripts/
      containers:
      - name: redis
        image: redis:7.2.4
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf"]
        ports:
        - containerPort: 6379
          name: redis
        volumeMounts:
        - name: data
          mountPath: /data
        - name: redis-config
          mountPath: /etc/redis/
      volumes:
      - name: data
        emptyDir: {}
      - name: redis-config
        emptyDir: {}
      - name: init-script
        configMap:
          name: redis-scripts-config-map
          defaultMode: 0777
          items:
          - key: redis_init.sh
            path: redis_init.sh
      - name: config
        configMap:
          name: redis-config-map
          items:
          - key: redis.conf
            path: redis.conf
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - redis
            topologyKey: kubernetes.io/hostname

sentinel-stateful.yml文件用于部署sentinel实例,配置了3副本和podAntiAffinity。注意副本数不能多于worker节点数,每个worker节点只能有一个sentinel pod,多出来的会一直pending;serviceName属性值必须与sentinel服务名保持一致,才能有sentinel-x.sentinel这样的地址。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sentinel
spec:
  serviceName: sentinel
  replicas: 3
  selector:
    matchLabels:
      app: sentinel
  template:
    metadata:
      labels:
        app: sentinel
    spec:
      initContainers:
      - name: config
        image: redis:7.2.4
        env:
          - name: REDIS_NODES
            valueFrom:
              configMapKeyRef:
                name: redis-config-map
                key: REDIS_NODES
          - name: REDIS_PASSWORD
            valueFrom:
              secretKeyRef:
                name: redis-secret
                key: REDIS_PASSWORD
        command: [ "sh", "-c", "/scripts/sentinel_init.sh" ]
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: init-script
          mountPath: /scripts/
      containers:
      - name: sentinel
        image: redis:7.2.4
        command: ["redis-sentinel"]
        args: ["/etc/redis/sentinel.conf"]
        ports:
        - containerPort: 5000
          name: sentinel
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: data
          mountPath: /data
      volumes:
      - name: init-script
        configMap:
          name: redis-scripts-config-map
          defaultMode: 0777
          items:
          - key: sentinel_init.sh
            path: sentinel_init.sh
      - name: redis-config
        emptyDir: {}
      - name: data
        emptyDir: {}
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - sentinel
            topologyKey: kubernetes.io/hostname

startCluster-dev.shstartCluster.sh分别为开发环境和生产环境的一键部署脚本。两份脚本只有Services部分不同。

startCluster-dev.sh

FILE_PATH="${1:-"./resources"}"
NAMESPACE="redis-sentinel"

# Namespace
kubectl create ns $NAMESPACE

# Config files
kubectl apply -f "$FILE_PATH/redis-config.yml" -n $NAMESPACE
kubectl apply -f "$FILE_PATH/redis-scripts-config.yml" -n $NAMESPACE
kubectl apply -f "$FILE_PATH/redis-secret.yml" -n $NAMESPACE

# Services
kubectl apply -f "$FILE_PATH/redis-services-dev-0.yml" -n $NAMESPACE
kubectl apply -f "$FILE_PATH/redis-services-dev-1.yml" -n $NAMESPACE

# Redis
kubectl apply -f "$FILE_PATH/redis-stateful.yml" -n $NAMESPACE

# TODO 有问题:当只有一个redis pod满足condition=ready条件的时候就往下执行启动Sentinel的命令了,导致Sentinel的pod启动失败
# kubectl wait --namespace=$NAMESPACE \
#              --for=condition=ready pod \
#              --selector=app=redis \
#              --timeout=280s

sleep 10

# Sentinel
# 等待redis的pod都变为running状态以后,再执行下面的命令
kubectl apply -f "$FILE_PATH/sentinel-stateful.yml" -n $NAMESPACE

startCluster.sh

FILE_PATH="${1:-"./resources"}"
NAMESPACE="redis-sentinel"

# Namespace
kubectl create ns $NAMESPACE

# Config files
kubectl apply -f "$FILE_PATH/redis-config.yml" -n $NAMESPACE
kubectl apply -f "$FILE_PATH/redis-scripts-config.yml" -n $NAMESPACE
kubectl apply -f "$FILE_PATH/redis-secret.yml" -n $NAMESPACE

# Services
kubectl apply -f "$FILE_PATH/redis-services.yml" -n $NAMESPACE

# Redis
kubectl apply -f "$FILE_PATH/redis-stateful.yml" -n $NAMESPACE

# TODO 有问题:当只有一个redis pod满足condition=ready条件的时候就往下执行启动Sentinel的命令了,导致Sentinel的pod启动失败
# kubectl wait --namespace=$NAMESPACE \
#              --for=condition=ready pod \
#              --selector=app=redis \
#              --timeout=280s

sleep 10

# Sentinel
# 等待redis的pod都变为running状态以后,再执行下面的命令
kubectl apply -f "$FILE_PATH/sentinel-stateful.yml" -n $NAMESPACE

将全套配置和脚本打包发送到一台k8s节点上,执行sh startCluster.shsh startCluster-dev.sh命令后稍等10秒钟再查看部署是否成功。

在生产环境如果出现类似的输出即表示部署成功:

[root@master1 resources]# kubectl get all -n redis-sentinel -o wide
NAME             READY   STATUS    RESTARTS   AGE   IP              NODE                  NOMINATED NODE   READINESS GATES
pod/redis-0      1/1     Running   0          44m   x.x.x.x         node3                 <none>           <none>
pod/redis-1      1/1     Running   0          44m   x.x.x.x         node2                 <none>           <none>
pod/redis-2      1/1     Running   0          44m   x.x.x.x         node1                 <none>           <none>
pod/sentinel-0   1/1     Running   0          39m   x.x.x.x         node1                 <none>           <none>
pod/sentinel-1   1/1     Running   0          39m   x.x.x.x         node2                 <none>           <none>
pod/sentinel-2   1/1     Running   0          39m   x.x.x.x         node3                 <none>           <none>

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE   SELECTOR
service/redis      ClusterIP   None            <none>        6379/TCP   44m   app=redis
service/sentinel   ClusterIP   x.x.x.x         <none>        5000/TCP   36s   app=sentinel

在开发环境如果出现类似的输出即表示部署成功:

[root@master1 ~]# kubectl get all -n redis-sentinel -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
pod/redis-0                         1/1     Running   0          4d12h   x.x.x.x         node3                 <none>           <none>
pod/redis-1                         1/1     Running   0          21h     x.x.x.x         node1                 <none>           <none>
pod/redis-2                         1/1     Running   0          4d12h   x.x.x.x         node2                 <none>           <none>
pod/sentinel-0                      1/1     Running   0          4d13h   x.x.x.x         node3                 <none>           <none>
pod/sentinel-1                      1/1     Running   0          4d13h   x.x.x.x         node2                 <none>           <none>
pod/sentinel-2                      1/1     Running   0          4d13h   x.x.x.x         node1                 <none>           <none>

NAME                           TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE     SELECTOR
service/redis                  ClusterIP      None             <none>        6379/TCP         4d13h   app=redis
service/redis-0                LoadBalancer   x.x.x.x          192.168.1.2   6379:30417/TCP   4d13h   statefulset.kubernetes.io/pod-name=redis-0
service/redis-1                LoadBalancer   x.x.x.x          192.168.1.3   6379:30753/TCP   4d13h   statefulset.kubernetes.io/pod-name=redis-1
service/redis-2                LoadBalancer   x.x.x.x          192.168.1.4   6379:30913/TCP   4d13h   statefulset.kubernetes.io/pod-name=redis-2
service/sentinel               LoadBalancer   x.x.x.x          192.168.1.1   5000:31882/TCP   4d13h   app=sentinel

方案验证

本节将分别演示Java和Golang两种技术栈的应用程序如何连接redis哨兵集群,并且验证杀掉master节点后sentinel是否可以正确实现failover。

Java

Java示例工程采用JDK 1.8 + Spring boot 2.7 + Lettuce连接池技术栈。工程结构如下:

.
├── pom.xml
└── src
    └── main
        ├── java
        │   └── cloud
        │       └── unionj
        │           └── cache
        │               ├── RedisApplication.java    # 启动类
        │               └── config
        │                   └── RedisConfig.java     # 配置类
        └── resources
            └── application.yml                      # 配置文件

8 directories, 4 files

代码说明请参考行内注释。

application.yml

spring:
  redis:
    password: xxxxxx               # redis密码
    sentinel:
      master: mymaster             # 集群名称
      nodes:
        - 192.168.1.1:5000         # sentinel连接地址
      password: xxxxxx             # sentinel密码
    lettuce:
      shutdown-timeout: 200ms

RedisConfig.java

package cloud.unionj.cache.config;

import io.lettuce.core.ReadFrom;
import lombok.RequiredArgsConstructor;
import org.springframework.boot.autoconfigure.data.redis.RedisProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.redis.connection.RedisPassword;
import org.springframework.data.redis.connection.RedisSentinelConfiguration;
import org.springframework.data.redis.connection.lettuce.LettuceClientConfiguration;
import org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory;

@RequiredArgsConstructor
@Configuration
public class RedisConfig {

    private final RedisProperties redisProperties;

    @Bean
    protected LettuceConnectionFactory redisConnectionFactory() {

        // 配置读写分离。从slave节点读
        LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .readFrom(ReadFrom.REPLICA_PREFERRED)
            .build();
        
        // 设置集群名称
        RedisSentinelConfiguration sentinelConfig = new RedisSentinelConfiguration()
            .master(redisProperties.getSentinel().getMaster());

        // 设置sentinel连接地址
        redisProperties.getSentinel().getNodes()
            .forEach(s -> sentinelConfig.sentinel(s.split(":")[0], Integer.valueOf(s.split(":")[1])));
        // 设置redis密码
        sentinelConfig.setPassword(RedisPassword.of(redisProperties.getPassword()));
        // 设置sentinel密码
        sentinelConfig.setSentinelPassword(RedisPassword.of(redisProperties.getSentinel().getPassword()));

        return new LettuceConnectionFactory(sentinelConfig, clientConfig);
    }

}

RedisApplication.java

package cloud.unionj.cache;

import javax.annotation.Resource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.data.redis.connection.RedisConnectionFactory;
import org.springframework.data.redis.connection.RedisSentinelConnection;
import org.springframework.data.redis.connection.RedisServer;

@SpringBootApplication
public class RedisApplication implements CommandLineRunner {

    @Resource
    private RedisConnectionFactory redisConnectionFactory;

    private static Logger LOG = LoggerFactory.getLogger(RedisApplication.class);

    public static void main(String[] args) {
        LOG.info("STARTING THE APPLICATION");
        SpringApplication.run(RedisApplication.class, args);
        LOG.info("APPLICATION FINISHED");
    }

    @Override
    public void run(String... args) throws Exception {
        RedisSentinelConnection sentinelConnection = redisConnectionFactory.getSentinelConnection();
        // 每隔3秒打印一次master节点的host
        while (true) {
            sentinelConnection.masters().stream().map(RedisServer::getHost).forEach(host -> {
                LOG.info("=========== Current master is {} ===========", host);
            });
            Thread.sleep(3000);
        }
    }
}

我们首先通过idea启动程序,等待输出=========== Current master is redis-x.redis ===========后,执行命令kubectl delete pod redis-x -n redis-sentinel杀掉redis-x这个pod,观察sentinel是否可以正确实现failover。

...
2024-02-25 16:01:22.848  INFO 58711 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2024-02-25 16:01:22.853  INFO 58711 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2024-02-25 16:01:22.853  INFO 58711 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.82]
2024-02-25 16:01:22.939  INFO 58711 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2024-02-25 16:01:22.939  INFO 58711 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 729 ms
2024-02-25 16:01:23.447  INFO 58711 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2024-02-25 16:01:23.455  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : Started RedisApplication in 1.566 seconds (JVM running for 1.958)
2024-02-25 16:01:23.455  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : Started RedisApplication in 1.566 seconds (JVM running for 1.958)
2024-02-25 16:01:23.674  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-2.redis ===========
2024-02-25 16:01:26.690  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-2.redis ===========
2024-02-25 16:01:29.715  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-2.redis ===========
2024-02-25 16:01:32.732  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-0.redis ===========
2024-02-25 16:01:35.761  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-0.redis ===========
2024-02-25 16:01:38.779  INFO 58711 --- [           main] cloud.unionj.cache.RedisApplication      : =========== Current master is redis-0.redis ===========

从上面的控制台输出的第12行和第13行,我们可以看到master节点成功实现了failover。我们的方案验证成功。

Golang

Golang示例工程采用Go 1.22.0 + go-doudou v2.3.0 + go-redis技术栈。工程结构如下:

.
├── Dockerfile
├── app.yml                             # 配置文件
├── cache
│   └── redis.go                        # redis连接对象
├── cmd
│   └── main.go                         # main函数
├── config
│   └── config.go                       # 全局配置对象
├── dto
│   └── dto.go
├── go.mod
├── go.sum
├── redissentinelgo_openapi3.go
├── redissentinelgo_openapi3.json
├── svc.go
├── svcimpl.go
└── transport
    └── httpsrv
        ├── handler.go
        ├── handlerimpl.go
        └── middleware.go

6 directories, 15 files

首先看一下app.yml文件。redissentinelgo为配置项前缀,可在config.go文件中自定义修改。go-doudou从v2.3.0版本起支持yaml格式的配置文件里通过中横线-来设置列表项,如redissentinelgo.redis.sentinel.nodes配置项可以通过-配置多个多个sentinel地址。

redissentinelgo:                     # 配置项前缀
  redis:
    password: xxxxxx                 # redis密码
    sentinel:
      master: mymaster               # 集群名称
      nodes:
        - 192.168.1.1:5000           # sentinel连接地址
      password: xxxxxx               # sentinel密码

再看一下config.go文件。请参考行内注释。

/**
* Generated by go-doudou v2.3.0.
* You can edit it as your need.
 */
package config

import (
   "github.com/unionj-cloud/go-doudou/v2/framework/config"
   "github.com/unionj-cloud/go-doudou/v2/toolkit/envconfig"
   "github.com/unionj-cloud/go-doudou/v2/toolkit/zlogger"
)

var G_Config *Config

type Config struct {
   Redis struct {
      Password string
      Sentinel struct {
         Master   string
         // 必须加values_by:"index"结构体标签才可以启用yaml中的中横线语法
         Nodes    []string `values_by:"index" required:"true"`
         Password string
      }
   }
   config.Config
}

func init() {
   var conf Config
   // redissentinelgo可以改成任意值
   err := envconfig.Process("redissentinelgo", &conf)
   if err != nil {
      zlogger.Panic().Msgf("Error processing environment variables: %v", err)
   }
   G_Config = &conf
}

func LoadFromEnv() *Config {
   return G_Config
}

然后看一下redis.go文件。这里声明了两个全局变量G_sentinelG_rdbG_sentinel用于连接sentinel服务。G_rdb用于连接redis进行数据操作。

package cache

import (
   "github.com/redis/go-redis/v9"
   "redis-sentinel-go/config"
)

var G_sentinel = redis.NewSentinelClient(&redis.Options{
   Addr:     config.G_Config.Redis.Sentinel.Nodes[0],
   Password: config.G_Config.Redis.Sentinel.Password,
})

var G_rdb = redis.NewFailoverClusterClient(&redis.FailoverOptions{
   MasterName:       config.G_Config.Redis.Sentinel.Master,
   SentinelAddrs:    config.G_Config.Redis.Sentinel.Nodes,
   SentinelPassword: config.G_Config.Redis.Sentinel.Password,
   Password:         config.G_Config.Redis.Password,
})

最后看一下main.go文件。与Java示例类似,我们每隔3秒发送一次SENTINEL GET-MASTER-ADDR-BY-NAME命令查询当前master节点信息,中途执行kubectl delete pod redis-x -n redis-sentinel命令杀死master节点,观察控制台输出的master节点有没有变化。如果有变化,证明sentinel可以正确实现failover。

/**
* Generated by go-doudou v2.3.0.
* You can edit it as your need.
 */
package main

import (
   "context"
   "github.com/samber/lo"
   "github.com/unionj-cloud/go-doudou/v2/framework/rest"
   "github.com/unionj-cloud/go-doudou/v2/toolkit/zlogger"
   service "redis-sentinel-go"
   "redis-sentinel-go/cache"
   "redis-sentinel-go/config"
   "redis-sentinel-go/transport/httpsrv"
   "time"
)

func main() {
   conf := config.LoadFromEnv()
   svc := service.NewRedisSentinelGo(conf)
   handler := httpsrv.NewRedisSentinelGoHandler(svc)
   srv := rest.NewRestServer()
   srv.AddRoute(httpsrv.Routes(handler)...)
    
   // 这里我们每隔3秒发送一次SENTINEL GET-MASTER-ADDR-BY-NAME命令查询当前master节点信息
   for {
      addr, err := cache.G_sentinel.GetMasterAddrByName(context.Background(), config.G_Config.Redis.Sentinel.Master).Result()
      if err != nil {
         zlogger.Err(err).Msg(err.Error())
      }
      lo.ForEach(addr, func(item string, index int) {
         zlogger.Info().Msg(item)
      })
      time.Sleep(3 * time.Second)
   }

   srv.Run()
}

通过Goland IDE启动程序后,我们观察到如下日志输出:

GOROOT=/Users/wubin1989/go1.22.0 #gosetup
GOPATH=/Users/wubin1989/go #gosetup
/Users/wubin1989/go1.22.0/bin/go build -o /private/var/folders/3m/c4phkfbs0zjbx6j5ytprprbc0000gn/T/___go_build_redis_sentinel_go_cmd redis-sentinel-go/cmd #gosetup
/private/var/folders/3m/c4phkfbs0zjbx6j5ytprprbc0000gn/T/___go_build_redis_sentinel_go_cmd
2024/02/25 18:04:51 maxprocs: Leaving GOMAXPROCS=16: CPU quota undefined
2024-02-25 18:04:51 INF cmd/main.go:32 > redis-0.redis
2024-02-25 18:04:51 INF cmd/main.go:32 > 6379
2024-02-25 18:04:54 INF cmd/main.go:32 > redis-0.redis
2024-02-25 18:04:54 INF cmd/main.go:32 > 6379
2024-02-25 18:04:57 INF cmd/main.go:32 > redis-0.redis
2024-02-25 18:04:57 INF cmd/main.go:32 > 6379
2024-02-25 18:05:00 INF cmd/main.go:32 > redis-0.redis
2024-02-25 18:05:00 INF cmd/main.go:32 > 6379
2024-02-25 18:05:03 INF cmd/main.go:32 > redis-0.redis
2024-02-25 18:05:03 INF cmd/main.go:32 > 6379
2024-02-25 18:05:07 INF cmd/main.go:32 > redis-1.redis
2024-02-25 18:05:07 INF cmd/main.go:32 > 6379
2024-02-25 18:05:10 INF cmd/main.go:32 > redis-1.redis
2024-02-25 18:05:10 INF cmd/main.go:32 > 6379

从第14行和第16行我们可以看出master节点从redis-0变成了redis-1,证明sentinel可以成功实现failover。

总结

本文首先简要介绍了方案的背景和选型的过程,然后从什么是哨兵和哨兵模式、部署环境、架构说明、方案实施、方案验证等五个方面详细介绍了我们的基于k8s的Redis哨兵高可用部署方案。全套配置、脚本和示例代码已上传到github.com/wubin1989/k…,可以拉下来代码动手实验一下。 该方案已经在线上生产环境成功实施并且稳定运行,希望可以给各位提供参考和启发。有任何疑问请留言,一起交流和进步!