Kubernetes集群实践(07)使用ECK Operator部署Elastic

网友投稿 365 2022-10-29

Kubernetes集群实践(07)使用ECK Operator部署Elastic

题外话:当初为什么坚持使用k8s来实现现有业务,有个很强的推动就是原有的虚拟方式部署Elastic集群已经不能适应业务规模的扩展,在虚拟机上架设大规模的Elastic集群已经是噩梦般的存在。因此,下决心采用容器的方式进行部署。容器的方式进行部署,不但实现了快速部署,同时采用k8s进行编排,简化了elastic集群的运维。当然,架设完k8s集群后,就部署Elastic集群(有状态集合)是增加了难度,涉及服务器暴露,持久话存储,单还是可以一步步的解决的。

相关简介

Kubernetes Operator

Operator 是由 CoreOS 开发的,用来扩展 Kubernetes API,特定的应用程序控制器,它用来创建、配置和管理复杂的有状态应用,如数据库、缓存和监控系统。Operator 基于 Kubernetes 的资源和控制器概念之上构建,但同时又包含了应用程序特定的领域知识。创建Operator 的关键是CRD(自定义资源)的设计。Kubernetes 1.7 版本以来就引入了自定义控制器的概念,该功能可以让开发人员扩展添加新功能,更新现有的功能,并且可以自动执行一些管理任务,这些自定义的控制器就像 Kubernetes 原生的组件一样,Operator 直接使用 Kubernetes API进行开发,也就是说他们可以根据这些控制器内部编写的自定义规则来监控集群、更改 Pods/Services、对正在运行的应用进行扩缩容。

ECK

Elastic Cloud on Kubernetes简称ECK,其扩展了Kubernetes的基本编排功能,以支持Kubernetes上Elasticsearch,Kibana和APM Server的设置和管理。借助ECK可以简化所有关键操作:

管理监控多个集群 扩展缩小集群 变更集群配置 调度备份 使用TLS证书保护集群 采用区域感知实现hot-warm-cold架构

安装ECK Operator

在线安装

kubectl apply -f save \ -o \导出 。如果没有镜像仓库,编排的时候默认文件没有指定在哪个物理节点上运行,所以在没有私有仓库的离线环境里,需要把上述镜像用docker load -i <镜像文件名.tar.gz>加载到所有节点中。注意:elastic现有官方镜像只有amd64!建议:为了以后的部署方便,还是架设一个私有镜像仓库,本人将在Kubernetes集群实践(07)来说明私有仓库的架设问题。

部署ElasicSearch

现有的物理环境中,我有一台华为E9000刀片服务器,配置了16个CH121 V5计算节点,该计算节点只有两个2.5寸的盘位,不能提供较大的存储空间。因此,我采用了将旁边的IP SAN的空间,划分成16个LUN后,分别挂载到16个CH121 V5节点上。(因为这个16个计算节点不是一个Linux集群,所以不能挂载一个大的共享LUN)每个CH121 V5使用挂载的空间做LocalPV,为ES的数据节点提供持久化存储。为何没有直接让Pod使用iSCSI,主要是我自己还没有搞定:(

配置PV

编辑文件1-test-pv.yaml

# set namespace for this elasticsearch cluster apiVersion: v1 kind: Namespace metadata: name: test --- # master-eligible storage class apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: nf5270m4-es-master namespace: test provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- # master pv0 apiVersion: v1 kind: PersistentVolume metadata: name: test-es-master-pv0 namespace: test labels: pvname: test-es-master-pv0 spec: capacity: storage: 32Gi volumeMode: Filesystem accessMode: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle # 持久化存储可以使用Retain但要注意PV重用的时候如何由Release状态变为Availbe storageClassName: nf5270m4-es-master local: path: /home/elastic-data/es-master-pv0 # 设置亲和性,指定主节点运行在一个具有kubernetes.io/hostname=nf5270m4标签的节点上(一台浪潮服务器) nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - nf5270m4 # master pv1 定义一个合规的主节点的PV,与pv0一样,只是pv0变成pv1,具体定义略,节省篇幅 ... # coordinate storage class apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: nf5270m4-es-coordinate namespace: test provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- # coordinate pv0 apiVersion: v1 kind: PersistentVolume metadata: name: test-es-coordinate-pv0 namespace: test labels: pvname: test-es-coordinate-pv0 spec: capacity: storage: 32Gi volumeMode: Filesystem accessMode: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle # Recycle 策略在Pod删除的时候清除PV的数据,同时变为Avaible状态 storageClassName: nf5270m4-es-coordinate-pv0 local: path: /home/elastic-data/es-coordinate-pv0 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - nf5270m4 --- # 同理可以配置ingest节点的storageclass,pv,pvc,略 ... --- # data storage class apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: e9k-es-data namespace: test provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer --- # data pv00 apiVersion: v1 kind: PersistentVolume metadata: name: test-es-data-pv00 namespace: test labels: pvname: test-es-data-pv00 spec: capacity: storage: 2Ti volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: e9k-es-data local: path: /vol/eck/data-pv00 # 此处是iSCSI挂载的物理主机目录 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - e9k-1-01 ... # 其它pv类似,不一一列出

配置PVC

编辑文件2-test-pvc.yaml,此处将pv和pvc的定义分开,便于PV和PVC单独操作(如在PV的回收策略为Retain时,只删除PVC)

# master pvc0 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-es-master-pvc0 namespace: test spec: resources: requests: storage: 32Gi accessModes: - ReadWriteOnce storageClassName: nf5270m4-es-master volumeName: es-master-pvc0 selector: matchLabels: pvname: test-es-master-pv0 # master pvc1 配置与pvc0类似,略 ... # coordinate pvc0 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-es-coordinate-pvc0 namespace: test spec: resources: requests: storage: 32Gi accessModes: - ReadWriteOnce storageClassName: nf5270m4-es-coordinate volumeName: es-coordinate-pvc0 selector: matchLabels: pvname: test-es-coordinate-pv0 --- # 同理可以配置ingest节点的storageclass,pv,pvc,略 ... # data pvc00 apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-es-data-pvc00 namespace: test spec: resources: requests: storage: 2Ti accessModes: - ReadWriteOnce storageClassName: e9k-es-data volumeName: es-data-pvc00 selector: matchLabels: pvname: test-es-data-pv00 --- # 其它数据节点的PVC配置相似,略

对于master-volting只是选举主节点,自己不能被选为主节点,所以没有给它配持久存储。

配置ES和Kibana节点

编辑3-test-eck.yaml

apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: test namespace: test spec: version: 7.6.0 image: nexus.internal.test:8082/amd64/elasticsearch:7.6.0 # 此处是我的私有仓库地址 imagePullPolicy: IfNotPresent updateStrategy: changeBudget: maxSurge: 2 # 缺省为-1,表示新的Pod会马上创建,这样会瞬间消耗大量的资源,然后再替换旧的Pod来进行升级 maxUnavailable: 1 # 缺省为1 podDisruptionBudget: spec: minAvailable: 1 # 缺省为1 selector: matchLabels: elasticsearch.k8s.elastic.co/cluster-name: test # 即metadata.name的值 nodeSets: # 定义合规的主节点 - name: master-eligible count: 2 config: node.master: true node.data: false node.ingest: false node.ml: false node.store.allow_mmap: false xpack.ml.enabled: false cluster.remote.connect: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 32Gi storageClassName: nf5270m4-es-master podTemplate: metadata: labels: app: master-eligible spec: nodeSelector: # 节点选取和污点容忍,因为nf5270m4这台浪潮服务器,是做私有仓库用的,一般不调度Pod "kubernetes.io/hostname": nf5270m4 tolerations: - key: "node-role.kubernetes.io/node" operator: "Exists" effect: "PreferNoSchedule" containers: # 定义资源限制 - name: elasticsearch resources: requests: cpu: 2 # 默认没有限制 memory: 16Gi # 默认为2Gi limits: # cpu: # 此处没有定义,缺省也没有定义,所以没有限制 memory: 24Gi # 缺省是2Gi env: - name: ES_JAVA_OPTS # 缺省是1Gi value: -Xms10g -Xmx10g initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] - name: master-voting # 定义选举节点 count: 1 config: node.master: true node.voting_only: true # 缺省值为false node.data: false node.ingest: false node.ml: false node.store.allow_mmap: false xpack.ml.enabled: false cluster.remote.connect: false podTemplate: metadata: labels: app: master-voting spec: nodeSelector: "kubernetes.io/hostname": nf5270m4 tolerations: - key: "node-role.kubernetes.io/node" operator: "Exists" effect: "PreferNoSchedule" initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] containers: - name: elasticsearch resources: requests: cpu: 1 # default is not set memory: 2Gi # default is 2Gi limits: cpu: 1 # default is not set memory: 2Gi # default is 2Gi env: - name: ES_JAVA_OPTS # default is 1 Gi value: 1Gi volumes: - name: elasticsearch-data emptyDir: {} # 使用空目录 initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] --- # 定义合规的主节点 - name: ingest count: 1 config: node.master: false node.data: false node.ingest: true node.ml: false node.store.allow_mmap: false cluster.remote.connect: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 32Gi storageClassName: nf5270m4-es-ingest podTemplate: metadata: labels: app: ingest spec: nodeSelector: # 节点选取和污点容忍,因为nf5270m4这台浪潮服务器,是做私有仓库用的,一般不调度Pod "kubernetes.io/hostname": nf5270m4 tolerations: - key: "node-role.kubernetes.io/node" operator: "Exists" effect: "PreferNoSchedule" containers: # 定义资源限制 - name: elasticsearch resources: requests: cpu: 1 # 默认没有限制 memory: 8Gi # 默认为2Gi limits: # cpu: # 此处没有定义,缺省也没有定义,所以没有限制 memory: 16Gi # 缺省是2Gi env: - name: ES_JAVA_OPTS # 缺省是1Gi value: -Xms10g -Xmx10g initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] --- # 定义合规的主节点 - name: coordinate count: 1 config: node.master: false node.data: false node.ingest: false node.ml: false node.store.allow_mmap: false cluster.remote.connect: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 32Gi storageClassName: nf5270m4-es-coordinate podTemplate: metadata: labels: app: coordinate spec: nodeSelector: # 节点选取和污点容忍,因为nf5270m4这台浪潮服务器,是做私有仓库用的,一般不调度Pod "kubernetes.io/hostname": nf5270m4 tolerations: - key: "node-role.kubernetes.io/node" operator: "Exists" effect: "PreferNoSchedule" containers: # 定义资源限制 - name: elasticsearch resources: requests: cpu: 4 # 默认没有限制 memory: 32Gi # 默认为2Gi limits: # cpu: # 此处没有定义,缺省也没有定义,所以没有限制 memory: 48Gi # 缺省是2Gi env: - name: ES_JAVA_OPTS # 缺省是1Gi value: -Xms16g -Xmx16g initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] --- # 定义合规的主节点 - name: data count: 64 config: node.master: false node.data: true node.ingest: false node.ml: false node.store.allow_mmap: false cluster.remote.connect: false volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 2Ti storageClassName: e9k-es-data podTemplate: metadata: labels: app: data spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: elasticsearch.k8s.elastic.co/cluster-name: test topologyKey: kubernetes.io/hostname containers: # 定义资源限制 - name: elasticsearch resources: requests: cpu: 2 # 默认没有限制 memory: 48Gi # 默认为2Gi limits: # cpu: # 此处没有定义,缺省也没有定义,所以没有限制 memory: 64Gi # 缺省是2Gi env: - name: ES_JAVA_OPTS # 缺省是1Gi value: -Xms31g -Xmx31g initContainers: - name: sysctl securityContext: privileged: true command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144'] --- apiVersion: kibana.k8s.elastic.co/v1 kind: Kibana metadata: name: test namespace: test spec: version: 7.6.0 image: nexus.internal.test:8082/amd64/kibana:7.6.0 # 此处是我的私有仓库地址 imagePullPolicy: IfNotPresent count: 1 elasticsearchRef: # 连接es集群的名字 name: "test" http: tls: selfSignedCertificate: disabled: true # 使用http访问 podTemplate: spec: nodeSelector: "kubernetes.io/hostname": nf5270m4 tolerations: - key: "node-role.kubernetes.io/node" operator: "Exists" effect: "PreferNoSchedule" containers: - name: kibana resources: requests: cpu: 1 memory: 2Gi limits: memory: 64Gi

说明:官方具体的安装可以参看https://elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html

配置服务暴露

服务暴露的方式使用前面架设的traefik,先编辑原来的traefik文件,还是以host端口映射(减小NAT开销)

spec: template: spec: containers: - name: traefik ports: # 以下是添加的端口 - name: elasticsearch containerPort: 9200 hostPort: 9200 ... args: # 添加entryPoints - --entrypoints.elasticsearch.Address=:9200

kibana因为通过web访问,直接复用80端口。编辑4-test-route.yaml

apiVersion: traefik.containo.us/v1alpha1 kind: Ingre***oute metadata: name: test-kibana-route namespace: test spec: entryPoints: - web routes: - match: Host(`kibana`, `kibana.internal.pla95929`) kind: Rule services: - name: test-kb-# 后端服务名 port: 5601 # 后端k8s的服务端口 --- apiVersion: traefik.containo.us/v1alpha1 kind: Ingre***oute metadata: name: test-es-route namespace: test spec: entryPoints: - elasticsearch routes: - match: Host(`es`, `es.internal.pla95929`) kind: Rule services: - name: test-es-# 后端服务名 port: 9200 # 后端k8s的服务端口

说明:服务端口和服务名可以通过以下命令查看

kubectl get svc -n test

附:

NFS挂载错误kibana节点不用持久化存储,但内部部署的模板和设置容易丢失,所以可以做一个持久化存储,且对存储的要求也不高,所以可以采用nfs这种简单的存储来实现,可能出现一个小问题: mount: wrong fs type, bad option, bad superblock on 192.168.0.106:/home/nfs1, missing codepage or helper program, or other error (for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount. helper program) In some cases useful info is found in syslog - try dmesg | tail or so 原因就是备调度的pod在k8s的运行节点上找不到/sbin/mount.xxx,可以检查是否安装nfs组件,我使用的是centos7需要运行以下命令来安装 yum install nfs-utils 对于kibana挂载持久化存储,以下是我的配置文件 kind: PersistentVolume apiVersion: v1 metadata: name: kb-pv spec: capacity: storage: 16Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: kb-data nfs: path: /home/data server: 172.17.1.254

kind: PersistentVolumeClaim apiVersion: v1 metadata: name: kb-pvc spec: resources: requests: storage: 16Gi accessModes: - ReadWriteOnce storageClassName: kb-data

PVC无法删除问题一般删除步骤为:先删pod再删pvc最后删pv,但是遇到pv始终处于“Terminating”状态,而且delete不掉。

解决方法:直接删除k8s中的记录:

kubectl patch pv xxx -p '{"metadata":{"finalizers":null}}'

参考信息:

This happens when persistent volume is protected. You should be able to cross verify this:

Command:

kubectl describe pvc PVC_NAME | grep Finalizers

Output:

Finalizers: [kubernetes.io/pvc-protection]

You can fix this by setting finalizers to null using kubectl patch:

kubectl patch pvc PVC_NAME -p '{"metadata":{"finalizers": []}}' --type=merge

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:Kubernetes集群实践导航
下一篇:SpringBoot调用公共模块的自定义注解失效的解决
相关文章

 发表评论

暂时没有评论,来抢沙发吧~