Who am I?
What's time series data? The fundamental of time series data
Related works about time series database benchmark
Why we choose the ClickHouse?
Introducing the ClickHouse
ClickHouse setup, configuring and troubleshooting
More references about the ClickHouse
Peter, GitHub
在GitHub活躍的開源專案貢獻者
Speaker
COSCUP、MOPCON......
Engineer
DevOps
Back-end
System Architecture Researching
Web Application Security
PHP, Python and JavaScript
Industrial Technology Research Institute
Smart Grid Technology (2017~2021)
Institute for Information Industry
Database, Data platform architecture (2021~)
IoT sensor datasets
Daily/Weekly/Monthly/Yearly temperature datasets
Smart Meter datasets
Water Meter datasets
And so on.
Closed Loop Benchmark for Timeseries Databases
SciTS: A Benchmark for Time-Series Database in Scientific
Experiments and Industrial Internet of Things
TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications
OLAP, Online analytical processing
column-oriented database
Time series PostgreSQL
Relational Database System
M1硬體規格
Intel Xeon CPU E5-1620 v2 @ 3.70GHz of 8 logical cores
32GB DDR3 RAM
24 physical drives formatted with XFS and configured with
RAID60 and connected using a 6 Gbit/sec SAS-2 controller
用來建置DB server
M2硬體規格
Intel Xeon CPU E5-2680 v3 @ 2.50GHz over 2 sockets of 48 logical cores in total
512GB DDR4 RAM
扮演client machine
測試的候選資料庫皆使用預設設定來運行
測試的資料表綱要
主要欄位有:timestamp, sensor_id, value
8 bytes for timestamp
8 bytes long integer as the sensor ID
8 bytes double-precision float for value.
將timestamp欄位建立索引
Data Ingestion資料寫入
Scaling Workload
每個目標資料庫同使併發使用48個clients
每個batch大小為20,000筆資料並總計2.8 billion(28億)筆資料
每個量測效能的資料庫皆有28億筆資料
ClickHouse表現出有較好的寫入效能,每秒可以寫入120萬筆資料
Queries Latency查詢延遲時間
填入了28億資料中進行查詢
總計為15天測量的資料
由100,000個感測器完成
每個查詢執行20次,每個查詢之間,清除資料表和系統與快取
確保查詢的結果都是從硬碟中查到的
Q1、以10分鐘區間找到10個不同感測器的資料出來
Queries Latency查詢延遲時間
Q2、從180分鐘區間找到範圍外(異常)的資料
較為複雜查詢在PostgreSQL中有瓶頸(B-Tree Index瓶頸)
Queries Latency查詢延遲時間
Q3、從15天中,10個不同感測器,60分鐘區間撈出資料,計算標準差
Queries Latency查詢延遲時間
Q4、從15天中,10個不同感測器,以小時間隔撈出一天的資料
Queries Latency查詢延遲時間
Q5、從15天中,2個不同感測器,以小時間隔撈出一天的資料
ClickHouse
有效率的儲存引擎
支援非常高的寫入速度(1.3 million/秒)
48個client數量同時併發連線之下,維持低CPU、低記憶體和硬碟I/O使用
查詢上有低延遲的時間,是因為有幾個原因如下
partitioned data management
sparse indexing algorithm
mix of compiled vectorized query execution
InfluxDB
LSM-Tree設計所以在寫入資料可高達每秒790,000筆資料
Concurrent表現出弱勢,CPU資源因textual protocol耗盡
Memory與disk I/O效能有效率
查詢延遲時間沒與ClickHouse一樣低 (LSM-Tree 犧牲read效能)
Log-structured merge-tree
TimescaleDB
與PosgreSQL比較,有很大的改良,寫入資料可每秒490,000筆資料
改善了系統資源使用量,與PostgreSQL比較來說也較佳
改良同時也影響了查詢延遲時間,查詢延遲時間也比PostgreSQL好
PostgreSQL
傳統的ACID設計的資料庫
原子性(Atomicity)
一致性(Consistency)
事務隔離(Isolation)
持久性(Durability)
系統資源使用量,不論是CPU資源、記憶體與disk I/O都沒有效率
Column-format資料有效的做到
資料分區並分成多個部分並擁有低延遲的查詢速率
對於系統資源使用量來說
有效率的RAM管理與使用
有效率的硬碟I/O使用與管理
Hardware requirements
CPU指令集
SSE 4.2 support
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
4 CPU cores at least
8 GB RAM at least
Ubuntu 20.04為例
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb [arch=amd64] https://packages.clickhouse.com/deb stable main" | \
sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-client clickhouse-server
Ubuntu 20.04為例
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb [arch=amd64] https://packages.clickhouse.com/deb stable main" | \
sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-client clickhouse-server
......
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse-bridge clickhouse-bridge
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-odbc-bridge'
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-library-bridge'
Enter password for default user:
Password for default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
chown -R clickhouse:clickhouse '/etc/clickhouse-server'
ClickHouse has been successfully installed.
$ sudo systemctl is-enabled clickhouse-server.service
enabled
$ sudo systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: enabled)
Active: inactive (dead)
$ sudo systemctl start clickhouse-server.service
$ sudo systemctl status clickhouse-server.service
Ubuntu 20.04為例
$ clickhouse-client --password
ClickHouse client version 23.9.3.12 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.9.3 revision 54466.
Warnings:
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
peterli-VirtualBox :) select version();
SELECT version()
Query id: f261aa63-39ff-4e61-959e-978f2f09dfda
┌─version()─┐
│ 23.9.3.12 │
└───────────┘
1 row in set. Elapsed: 0.004 sec.
peterli-VirtualBox :)
pip install clickhouse-connect
import clickhouse_connect
clickhouse_client = clickhouse_connect.get_client(
host=host,
username=username,
password=password,
port=port,
database=db_name
)
clickhouse_client.query('CREATE DATABASE IF NOT EXISTS 3e_green_ems')
table_lists = clickhouse_client.query('SHOW TABLES')
for device in table_lists.result_rows:
......
Ubuntu 20.04為例
Warnings:
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
Ubuntu 20.04為例
Warnings:
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
LimitNPROC=64000
[Service]
Type=notify
......
LimitCORE=infinity
LimitNOFILE=500000
LimitNPROC=64000
$ sudo systemctl daemon-reload
$ sudo systemctl restart clickhouse-server.service
$ clickhouse-client --password
ClickHouse client version 23.9.3.12 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.9.3 revision 54466.
peterli-VirtualBox :) Bye.
Papers
SciTS: A Benchmark for Time-Series Database in ScientificExperiments and Industrial Internet of Things
ClickHouse:時序資料庫建置與運行(2022 iThome 鐵人賽文章)
Official ClickHouse documentation
More ClickHouse tutorials
ClickHouse hardware benchmarking
ClickHouse troubleshooting