MOPCON 2023
My journey of the time series database
Peter
2023/11/12
Slide
Outlines
-
Who am I?
-
What's time series data? The fundamental of time series data
-
Related works about time series database benchmark
-
Why we choose the ClickHouse?
-
Introducing the ClickHouse
-
ClickHouse setup, configuring and troubleshooting
-
More references about the ClickHouse
Who am I?
-
Peter, GitHub
-
在GitHub活躍的開源專案貢獻者
-
Speaker
-
COSCUP、MOPCON......
-
-
Engineer
-
DevOps
-
Back-end
-
System Architecture Researching
-
Web Application Security
-
PHP, Python and JavaScript
-
-
Industrial Technology Research Institute
-
Smart Grid Technology (2017~2021)
-
-
Institute for Information Industry
-
Database, Data platform architecture (2021~)
-
What's time series data?
-
索引與timestamp時間戳記有關
-
資料量大小會持續的增大
-
在一個範圍中,通常會進行
-
聚合(aggregated)
-
降採樣(down-sampled)
-
查詢(queries)
-
-
有密集寫入的效能需求
The fundamental of time series data
Time series data examples
-
IoT sensor datasets
-
Daily/Weekly/Monthly/Yearly temperature datasets
-
Smart Meter datasets
-
Water Meter datasets
-
And so on.
Related work
-
Closed Loop Benchmark for Timeseries Databases
-
SciTS: A Benchmark for Time-Series Database in Scientific
Experiments and Industrial Internet of Things -
TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications
It's about the time series database benchmark
Choosing the correct time series database
Related work
Why papers are lacking? From 2019 to 2020
Comparing the four databases
-
ClickHouse
OLAP, Online analytical processing
column-oriented database
InfluxDB
-
TimescaleDB
Time series PostgreSQL
-
PostgreSQL
Relational Database System
Experiment setup
-
利用科學實驗(KATRIN)作為使用者案例來評估TSDB
-
卡爾斯魯厄氚中微子實驗
-
-
此Benchmark是針對此實驗設計的
-
但是也可以適用在IIoT環境中
-
感測器測量到的數值
-
離子監控
-
電力、電流與電壓
-
磁場相關的數值
-
監控的溫度
-
Experiment setup
-
M1硬體規格
-
Intel Xeon CPU E5-1620 v2 @ 3.70GHz of 8 logical cores
-
32GB DDR3 RAM
-
24 physical drives formatted with XFS and configured with
-
RAID60 and connected using a 6 Gbit/sec SAS-2 controller
-
用來建置DB server
-
-
M2硬體規格
-
Intel Xeon CPU E5-2680 v3 @ 2.50GHz over 2 sockets of 48 logical cores in total
-
512GB DDR4 RAM
-
扮演client machine
-
Experiment setup
-
測試的候選資料庫皆使用預設設定來運行
-
測試的資料表綱要
-
主要欄位有:timestamp, sensor_id, value
-
8 bytes for timestamp
-
8 bytes long integer as the sensor ID
-
8 bytes double-precision float for value.
-
將timestamp欄位建立索引
-
Experiment results
-
Data Ingestion資料寫入
-
Scaling Workload
-
每個目標資料庫同使併發使用48個clients
-
每個batch大小為20,000筆資料並總計2.8 billion(28億)筆資料
-
每個量測效能的資料庫皆有28億筆資料
-
ClickHouse表現出有較好的寫入效能,每秒可以寫入120萬筆資料
-
Experiment results
Experiment results
-
Queries Latency查詢延遲時間
-
填入了28億資料中進行查詢
-
總計為15天測量的資料
-
由100,000個感測器完成
-
-
每個查詢執行20次,每個查詢之間,清除資料表和系統與快取
-
確保查詢的結果都是從硬碟中查到的
-
-
Q1、以10分鐘區間找到10個不同感測器的資料出來
Experiment results
Experiment results
-
Queries Latency查詢延遲時間
-
Q2、從180分鐘區間找到範圍外(異常)的資料
-
較為複雜查詢在PostgreSQL中有瓶頸(B-Tree Index瓶頸)
-
Experiment results
Experiment results
-
Queries Latency查詢延遲時間
-
Q3、從15天中,10個不同感測器,60分鐘區間撈出資料,計算標準差
-
Experiment results
Experiment results
-
Queries Latency查詢延遲時間
-
Q4、從15天中,10個不同感測器,以小時間隔撈出一天的資料
-
Experiment results
Experiment results
-
Queries Latency查詢延遲時間
-
Q5、從15天中,2個不同感測器,以小時間隔撈出一天的資料
-
Experiment results
Conclusion
-
ClickHouse
-
有效率的儲存引擎
-
支援非常高的寫入速度(1.3 million/秒)
-
48個client數量同時併發連線之下,維持低CPU、低記憶體和硬碟I/O使用
-
查詢上有低延遲的時間,是因為有幾個原因如下
-
partitioned data management
-
sparse indexing algorithm
-
mix of compiled vectorized query execution
-
-
Conclusion
-
InfluxDB
-
LSM-Tree設計所以在寫入資料可高達每秒790,000筆資料
-
Concurrent表現出弱勢,CPU資源因textual protocol耗盡
-
Memory與disk I/O效能有效率
-
查詢延遲時間沒與ClickHouse一樣低 (LSM-Tree 犧牲read效能)
-
Log-structured merge-tree
-
-
Conclusion
-
TimescaleDB
-
與PosgreSQL比較,有很大的改良,寫入資料可每秒490,000筆資料
-
改善了系統資源使用量,與PostgreSQL比較來說也較佳
-
改良同時也影響了查詢延遲時間,查詢延遲時間也比PostgreSQL好
-
-
PostgreSQL
-
傳統的ACID設計的資料庫
-
原子性(Atomicity)
-
一致性(Consistency)
-
事務隔離(Isolation)
-
持久性(Durability)
-
-
系統資源使用量,不論是CPU資源、記憶體與disk I/O都沒有效率
-
Conclusion
-
Column-format資料有效的做到
-
資料分區並分成多個部分並擁有低延遲的查詢速率
-
對於系統資源使用量來說
-
有效率的RAM管理與使用
-
有效率的硬碟I/O使用與管理
-
Row-oriented V.S. Column-oriented
That's the reason why we choose the ClickHouse
ClickHouse installation
-
Hardware requirements
-
CPU指令集
-
SSE 4.2 support
-
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
-
-
4 CPU cores at least
-
8 GB RAM at least
-
ClickHouse installation
-
Ubuntu 20.04為例
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb [arch=amd64] https://packages.clickhouse.com/deb stable main" | \
sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-client clickhouse-server
ClickHouse configuring
-
Ubuntu 20.04為例
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb [arch=amd64] https://packages.clickhouse.com/deb stable main" | \
sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
sudo apt-get install -y clickhouse-client clickhouse-server
......
useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse-bridge clickhouse-bridge
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-odbc-bridge'
chown -R clickhouse-bridge:clickhouse-bridge '/usr/bin/clickhouse-library-bridge'
Enter password for default user:
Password for default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
chown -R clickhouse:clickhouse '/etc/clickhouse-server'
ClickHouse has been successfully installed.
$ sudo systemctl is-enabled clickhouse-server.service
enabled
$ sudo systemctl status clickhouse-server.service
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/lib/systemd/system/clickhouse-server.service; enabled; vendor preset: enabled)
Active: inactive (dead)
$ sudo systemctl start clickhouse-server.service
$ sudo systemctl status clickhouse-server.service
ClickHouse client usage
-
Ubuntu 20.04為例
$ clickhouse-client --password
ClickHouse client version 23.9.3.12 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.9.3 revision 54466.
Warnings:
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
peterli-VirtualBox :) select version();
SELECT version()
Query id: f261aa63-39ff-4e61-959e-978f2f09dfda
┌─version()─┐
│ 23.9.3.12 │
└───────────┘
1 row in set. Elapsed: 0.004 sec.
peterli-VirtualBox :)
Acces ClickHouse database
-
pip install clickhouse-connect
import clickhouse_connect
clickhouse_client = clickhouse_connect.get_client(
host=host,
username=username,
password=password,
port=port,
database=db_name
)
clickhouse_client.query('CREATE DATABASE IF NOT EXISTS 3e_green_ems')
table_lists = clickhouse_client.query('SHOW TABLES')
for device in table_lists.result_rows:
......
ClickHouse troubleshooting
-
Ubuntu 20.04為例
-
Warnings:
-
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
-
ClickHouse troubleshooting
-
Ubuntu 20.04為例
-
Warnings:
-
* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.
-
LimitNPROC=64000
-
-
[Service]
Type=notify
......
LimitCORE=infinity
LimitNOFILE=500000
LimitNPROC=64000
$ sudo systemctl daemon-reload
$ sudo systemctl restart clickhouse-server.service
$ clickhouse-client --password
ClickHouse client version 23.9.3.12 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.9.3 revision 54466.
peterli-VirtualBox :) Bye.
The ClickHouse prefers dedicated server
Contributing the ClickHouse!
System architecture with ClickHouse
System architecture with ClickHouse (2)
References
-
Papers
-
SciTS: A Benchmark for Time-Series Database in ScientificExperiments and Industrial Internet of Things
-
-
ClickHouse:時序資料庫建置與運行(2022 iThome 鐵人賽文章)
-
Official ClickHouse documentation
-
More ClickHouse tutorials
-
ClickHouse hardware benchmarking
-
ClickHouse troubleshooting
Any Questions?
MOPCON 2023
By peter279k
MOPCON 2023
MOPCON 2023
- 402