Prometheus监控学习笔记之Prometheus存储-APISpace

Prometheus监控学习笔记之Prometheus存储

0x00 概述

Prometheus之于kubernetes(监控领域)，如kubernetes之于容器编排。

随着heapster不再开发和维护以及influxdb 集群方案不再开源，heapster+influxdb的监控方案，只适合一些规模比较小的k8s集群。而prometheus整个社区非常活跃,除了官方社区提供了一系列高质量的exporter，例如node_exporter等。Telegraf(集中采集metrics) + prometheus的方案，也是一种减少部署和管理各种exporter工作量的很好的方案。

今天主要讲讲我司在使用prometheus过程中，存储方面的一些实战经验。

0x01 Prometheus 储存瓶颈

通过prometheus的架构图可以看出，prometheus提供了本地存储，即tsdb时序数据库。本地存储的优势就是运维简单,缺点就是无法海量的metrics持久化和数据存在丢失的风险，我们在实际使用过程中，出现过几次wal文件损坏，无法再写入的问题。

当然prometheus2.0以后压缩数据能力得到了很大的提升。为了解决单节点存储的限制，prometheus没有自己实现集群存储，而是提供了远程读写的接口，让用户自己选择合适的时序数据库来实现prometheus的扩展性。

prometheus通过下面两种方式来实现与其他的远端存储系统对接

Prometheus 按照标准的格式将metrics写到远端存储prometheus 按照标准格式从远端的url来读取metrics

0x02 metrics的持久化的意义和价值

其实监控不仅仅是体现在可以实时掌握系统运行情况，及时报警这些。而且监控所采集的数据，在以下几个方面是有价值的

资源的审计和计费。这个需要保存一年甚至多年的数据的。故障责任的追查后续的分析和挖掘，甚至是利用AI，可以实现报警规则的设定的智能化，故障的根因分析以及预测某个应用的qps的趋势，提前HPA等，当然这是现在流行的AIOPS范畴了。

0x03 Prometheus 数据持久化方案

3.1 方案选型

社区中支持prometheus远程读写的方案

AppOptics: writeChronix: writeCortex: read and writeCrateDB: read and writeElasticsearch: writeGnocchi: writeGraphite: writeInfluxDB: read and writeOpenTSDB: writePostgreSQL/TimescaleDB: read and writeSignalFx: writeclickhouse: read and write

3.2 选型方案需要具备以下几点

满足数据的安全性，需要支持容错，备份写入性能要好，支持分片技术方案不复杂用于后期分析的时候，查询语法友好grafana读取支持，优先考虑需要同时支持读写

基于以上的几点，clickhouse满足我们使用场景。

Clickhouse是一个高性能的列式数据库，因为侧重于分析，所以支持丰富的分析函数。

下面是Clickhouse官方推荐的几种使用场景：

Web and App analyticsAdvertising networks and RTBTelecommunicationsE-commerce and financeInformation securityMonitoring and telemetryTime seriesBusiness intelligenceOnline gamesInternet of Things

ck适合用于存储Time series

此外社区已经有graphouse项目，把ck作为Graphite的存储。

0x04 性能测试

4.1 写入测试

本地mac，docker 启动单台ck，承接了3个集群的metrics，均值达到12910条/s。写入毫无压力。其实在网盟等公司，实际使用时，达到30万/s。

4.2 查询测试

fbe6a4edc3eb :) select count(*) from metrics.samples;SELECT count(*)FROM metrics.samples┌──count()─┐│ 22687301 │└──────────┘1 rows in set. Elapsed: 0.014 sec. Processed 22.69 million rows, 45.37 MB (1.65 billion rows/s., 3.30 GB/s.)

其中最有可能耗时的查询：

1)查询聚合sum

fbe6a4edc3eb :) select sum(val) from metrics.samples where arrayExists(x -> 1 == match(x, 'cid=9'),tags) = 1 and name = 'machine_cpu_cores' and ts > '2017-07-11 08:00:00'SELECT sum(val)FROM metrics.samplesWHERE (arrayExists(x -> (1 = match(x, 'cid=9')), tags) = 1) AND (name = 'machine_cpu_cores') AND (ts > '2017-07-11 08:00:00')┌─sum(val)─┐│ 6324 │└──────────┘1 rows in set. Elapsed: 0.022 sec. Processed 57.34 thousand rows, 34.02 MB (2.66 million rows/s., 1.58 GB/s.)

2）group by 查询

fbe6a4edc3eb :) select sum(val), time from metrics.samples where arrayExists(x -> 1 == match(x, 'cid=9'),tags) = 1 and name = 'machine_cpu_cores' and ts > '2017-07-11 08:00:00' group by toDate(ts) as time;SELECT sum(val), timeFROM metrics.samplesWHERE (arrayExists(x -> (1 = match(x, 'cid=9')), tags) = 1) AND (name = 'machine_cpu_cores') AND (ts > '2017-07-11 08:00:00')GROUP BY toDate(ts) AS time┌─sum(val)─┬───────time─┐│ 6460 │ 2018-07-11 ││ 136 │ 2018-07-12 │└──────────┴────────────┘2 rows in set. Elapsed: 0.023 sec. Processed 64.11 thousand rows, 36.21 MB (2.73 million rows/s., 1.54 GB/s.)

3) 正则表达式

fbe6a4edc3eb :) select sum(val) from metrics.samples where name = 'container_memory_rss' and arrayExists(x -> 1 == match(x, '^pod_name=ofo-eva-hub'),tags) = 1 ;SELECT sum(val)FROM metrics.samplesWHERE (name = 'container_memory_rss') AND (arrayExists(x -> (1 = match(x, '^pod_name=ofo-eva-hub')), tags) = 1)┌─────sum(val)─┐│ 870016516096 │└──────────────┘1 rows in set. Elapsed: 0.142 sec. Processed 442.37 thousand rows, 311.52 MB (3.11 million rows/s., 2.19 GB/s.)

总结：

利用好所建索引，即使在大数据量下，查询性能非常好。

0x05 方案设计

关于此架构，有以下几点：

每个k8s集群部署一个Prometheus-clickhouse-adapter 。关于Prometheus-clickhouse-adapter该组件，下面我们会详细解读。clickhouse 集群部署，需要zk集群做一致性表数据复制。

而clickhouse 的集群示意图如下：

ReplicatedMergeTree + Distributed。ReplicatedMergeTree里，共享同一个ZK路径的表，会相互，注意是，相互同步数据每个IDC有3个分片，各自占1/3数据每个节点，依赖ZK，各自有2个副本

这块详细步骤和思路，请参考ClickHouse集群搭建从0到1。感谢新浪的鹏哥指点。

zk集群部署注意事项：

安装 ZooKeeper 3.4.9或更高版本的稳定版本不要使用zk的默认配置，默认配置就是一个定时炸弹。

# The ZooKeeper server won't delete files from old snapshots and logs when using the default configuration (see autopurge), and this is the responsibility of the operator.

ck官方给出的配置如下zoo.cfg：

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial# synchronization phase can takeinitLimit=30000# The number of ticks that can pass between# sending a request and getting an acknowledgementsyncLimit=10maxClientCnxns=2000maxSessionTimeout=60000000# the directory where the snapshot is stored.dataDir=/opt/zookeeper/{{ cluster['name'] }}/data# Place the dataLogDir to a separate physical disc for better performancedataLogDir=/opt/zookeeper/{{ cluster['name'] }}/logsautopurge.snapRetainCount=10autopurge.purgeInterval=1# To avoid seeks ZooKeeper allocates space in the transaction log file in# blocks of preAllocSize kilobytes. The default block size is 64M. One reason# for changing the size of the blocks is to reduce the block size if snapshots# are taken more often. (Also, see snapCount).preAllocSize=131072# Clients can submit requests faster than ZooKeeper can process them,# especially if there are a lot of clients. To prevent ZooKeeper from running# out of memory due to queued requests, ZooKeeper will throttle clients so that# there is no more than globalOutstandingLimit outstanding requests in the# system. The default limit is 1,000.ZooKeeper logs transactions to a# transaction log. After snapCount transactions are written to a log file a# snapshot is started and a new transaction log file is started. The default# snapCount is 10,000.snapCount=3000000# If this option is defined, requests will be will logged to a trace file named# traceFile.year.month.day.#traceFile=# Leader accepts client connections. Default value is "yes". The leader machine# coordinates updates. For higher update throughput at thes slight expense of# read throughput the leader can be configured to not accept clients and focus# on coordination.leaderServes=yesstandaloneEnabled=falsedynamicConfigFile=/etc/zookeeper-{{ cluster['name'] }}/conf/zoo.cfg.dynamic

每个版本的ck配置文件不太一样，这里贴出一个390版本的

information /data/ck/log/clickhouse-server.log /data/ck/log/clickhouse-server.err.log 1000M 10 < 9000 /etc/clickhouse-server/server.crt /etc/clickhouse-server/server.key /etc/clickhouse-server/dhparam.pem none true true sslv2,sslv3 true true true sslv2,sslv3 true RejectCertificateHandler 0.0.0.0 4096 3 100 8589934592 5368709120 /data/ck/data/ /data/ck/tmp/ /data/ck/user_files/ users.xml default default false ck11.ruly.xxx.net 9000 ck12.ruly.xxx.net 9000 zk1.ruly.xxx.net 2181 zk2.ruly.xxx.net 2181 zk3.ruly.xxx.net 2181 1 ck11.ruly.ofo.net 3600 3600 60 system

query_log

toYYYYMM(event_date) 7500 *_dictionary.xml /clickhouse/task_queue/ddl /var/lib/clickhouse/format_schemas/

0x06 Prometheus-Clickhuse-Adapter组件

Prometheus-Clickhuse-Adapter(Prom2click) 是一个将clickhouse作为prometheus 数据远程存储的适配器。

prometheus-clickhuse-adapter，该项目缺乏日志，对于一个实际生产的项目，是不够的，此外一些数据库连接细节实现的也不够完善，已经在实际使用过程中将改进部分作为pr提交。

在实际使用过程中，要注意并发写入数据的数量，及时调整启动参数ch.batch 的大小，实际就是批量写入ck的数量，目前我们设置的是65536。因为ck的Merge引擎有一个300的限制，超过会报错

Too many parts (300). Merges are processing significantly slower than inserts

300是指 processing，不是指一次批量插入的条数。

参考

oracle竖列的数据怎么变成一行

342 2022-11-01

Prometheus监控学习笔记之Prometheus存储

oracle竖列的数据怎么变成一行

Taskflow API之三大特性

RESTful API设计规范

推荐文章

api接口有哪几种分类及功能

什么是API接口?API接口简单介绍

短信API接口概述，短信API接口的优势

7款快递物流的物流查询API工具，物流快递查询API接口怎么对接？

企业四要素: 了解企业经营成功的关键

什么是语音验证码?,语音验证码平台有哪些

全国工商查询系统怎么查企业名录

哪些平台提供实名认证的接口？

PHP如何调用API接口?

如何使用百度天气预报API接口?

最近发表

热评文章

数据接口api（数据接口API开发平台）

数据开放接口api（数据服务api开发）

Python爬虫教程：爬取酷狗音乐（python爬取

hbuilder怎么更改字体大小和颜色

直播平台api接口 - 构建卓越的直播平台

实时股票数据api接口（股票实时行情api接口）