-
Notifications
You must be signed in to change notification settings - Fork 48
Changes for clickhouse-keeper article #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
542f375
618f110
d35c83b
7ab0546
0676176
5990806
18908c4
a9b320b
3d808f8
ba8cde3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,52 +12,71 @@ Since 2021 the development of built-in ClickHouse® alternative for Zookeeper is | |
|
|
||
| See slides: https://presentations.clickhouse.com/meetup54/keeper.pdf and video https://youtu.be/IfgtdU1Mrm0?t=2682 | ||
|
|
||
| ## Current status (last updated: July 2023) | ||
| ## Current status (last updated: March 2026) | ||
|
|
||
| Since version 23.3 we recommend using clickhouse-keeper for new installations. | ||
| ClickHouse Keeper is the recommended choice for new installations. It yields better performance in many cases due to the new features, like async replication or multi read. Some ClickHouse server features cannot be used without Keeper, for example the S3Queue. | ||
|
|
||
| Even better if you will use the latest version of clickhouse-keeper (currently it's 23.7), and it's not necessary to use the same version of clickhouse-keeper as ClickHouse itself. | ||
| - Use the latest Keeper version available in your supported upgrade path whenever possible. | ||
| - The Keeper version doesn’t need to match the ClickHouse server version | ||
| - Modern Keeper usually performs better than older versions because the codebase has matured significantly, new protocol feature flags have been added, and internal replication has improved. | ||
|
|
||
| For existing systems that currently use Apache Zookeeper, you can consider upgrading to clickhouse-keeper especially if you will [upgrade ClickHouse](https://altinity.com/clickhouse-upgrade-overview/) also. | ||
|
|
||
| But please remember that on very loaded systems the change can give no performance benefits or can sometimes lead to a worse performance. | ||
| {{% alert title="Warning" color="warning" %}} | ||
| Before upgrading ClickHouse Keeper from version older than 23.9 please check Upgrade caveat for async_replication [Upgrade caveat for async_replication](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper#upgrade-caveat-for-async_replication) | ||
| {{% /alert %}} | ||
|
|
||
| The development pace of keeper code is [still high](https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+keeper) | ||
| so every new version should bring improvements / cover the issues, and stability/maturity grows from version to version, so | ||
| if you want to play with clickhouse-keeper in some environment - please use [the most recent ClickHouse releases](https://altinity.com/altinity-stable/)! And of course: share your feedback :) | ||
| ## How does clickhouse-keeper differ from Zookeeper? | ||
|
|
||
| ## How does clickhouse-keeper work? | ||
| Keeper is optimized for ClickHouse workloads and written in C++ (and can be used as single-binary), so it don't need any external dependencies. It uses the same **client** protocol but both are implementing different consensus protocol: Zookeeper is using ZAB, while ClickHouse Keeper implements eBay NuRAFT [GitHub - eBay/NuRaft: C++ implementation of Raft core logic as a replication library](https://github.com/eBay/NuRaft) which improves stability and performance of base RAFT protocol. | ||
|
|
||
| Official docs: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/ | ||
| ClickHouse Keeper can also run in embedded mode, operating as a separate thread within the ClickHouse server process, which may be suitable for testing purposes or smaller instances where some performance can be sacrificed for simplicity | ||
|
|
||
| ClickHouse-keeper still need to be started additionally on few nodes (similar to 'normal' zookeeper) and speaks normal zookeeper protocol - needed to simplify A/B tests with real zookeeper. | ||
| ## Migration and upgrade guide | ||
|
|
||
| To test that you need to run 3 instances of clickhouse-server (which will mimic zookeeper) with an extra config like that: | ||
| - A mixed ZooKeeper / ClickHouse Keeper quorum is not supported. Those are different consensus protocols. | ||
| - ZooKeeper snapshots and transaction logs are not format-compatible with Keeper. For data migration use `clickhouse-keeper-converter`. | ||
| - If the above is too complex you can switch to new, empty Keeper ensemble and recreate the Keeper metadata using `SYSTEM RESTORE REPLICA` calls. This method takes longer time but it is suitable for smaller clusters. Check [procedure to restore multiple tables in RO mode article](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-check-replication-ddl-queue/#procedure-to-restore-multiple-tables-in-read-only-mode-per-replica) | ||
| - It is usually reasonable to migrate Keeper together with a ClickHouse upgrade, especially if your current deployment is still on older `23.x` builds. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean that Keeper should not be older than ClickHouse? |
||
|
|
||
| [https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml) | ||
| ### Upgrade caveat for `async_replication` | ||
|
|
||
| [https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml) | ||
| `async_replication` is an internal Keeper optimization for RAFT replication and it's turned on by default starting from [25.10](https://github.com/ClickHouse/ClickHouse/pull/88515) . It does not change ClickHouse replicated table semantics, but it can improve Keeper performance. | ||
|
|
||
| or event single instance with config like that: [https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml) | ||
| [https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml) | ||
| If you upgrade directly from a version older than `23.9` to `25.10+`: | ||
|
|
||
| And point all the ClickHouses (zookeeper config section) to those nodes / ports. | ||
| - either upgrade Keeper to `23.9+` first, and then continue to `25.10+` | ||
| - or temporarily set `keeper_server.coordination_settings.async_replication=0` during the upgrade and enable it after the upgrade is finished | ||
|
|
||
| Latest version is recommended (even testing / master builds). We will be thankful for any feedback. | ||
| ### Keeper in kubernetes | ||
|
|
||
| If you run ClickHouse on Kubernetes with Altinity operator, Keeper can be managed as a dedicated `ClickHouseKeeperInstallation` resource (often abbreviated as CHK). That is usually the cleanest way to run and upgrade a separate Keeper ensemble on Kubernetes. Please check examples [here](https://github.com/Altinity/clickhouse-operator/blob/master/docs/chk-examples/01-chi-simple-with-keeper.yaml). | ||
|
|
||
| ## systemd service file | ||
|
|
||
| See | ||
| https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/ | ||
| See https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/ | ||
|
|
||
| ## init.d script | ||
|
|
||
| See | ||
| https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/ | ||
| See https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/ | ||
|
|
||
| ## More than 3 Keeper nodes | ||
|
|
||
| The main issue with a larger Keeper ensemble is that it takes more time to re-elect a leader, and commits take longer, which can slow down insertions and DDL queries. | ||
|
|
||
| It should be fine, but we don’t recommend running more than three Keeper nodes (excluding observers). | ||
|
|
||
| Increasing the number of nodes offers no significant advantages (unless you need to tolerate the simultaneous failure of two Keeper nodes). In terms of performance, it doesn’t perform better—and may even perform worse—and it consumes additional resources (ZooKeeper requires fast, dedicated disks to perform well, as well as some RAM and CPU). | ||
|
|
||
| ## Example of a simple cluster with 2 nodes of ClickHouse using built-in keeper | ||
| ## clickhouse-keeper-client | ||
|
|
||
| For example you can start two ClickHouse nodes (hostname1, hostname2) | ||
| In clickhouse-keeper-client, paths are now parsed more strictly and must be passed as string literals. In practice, this means using single quotes around paths—for example, ls '/' instead of ls /, and get '/clickhouse/path' instead of get /clickhouse/path. | ||
|
|
||
| ## Example of a simple cluster | ||
|
|
||
| The Keeper ensemble size must be odd because it requires a majority (50% + 1 nodes) to form a quorum. A 2-node Keeper setup will lose quorum after a single node failure, so the recommended number of Keeper replicas is 3. | ||
|
|
||
| On `hostname1` and `hostname2` below, ClickHouse can use the embedded Keeper cluster from `<keeper_server>`, so a separate client-side `<keeper>` section is not required. If your ClickHouse servers connect to an external Keeper or ZooKeeper ensemble, see [ClickHouse config for Keeper]({{< ref "clickhouse-keeper-clickhouse-config" >}}). | ||
SaltTan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### hostname1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's replace 'yandex' with 'clickhouse' in configs |
||
|
|
||
|
|
@@ -76,31 +95,28 @@ $ cat /etc/clickhouse-server/config.d/keeper.xml | |
| <operation_timeout_ms>10000</operation_timeout_ms> | ||
| <session_timeout_ms>30000</session_timeout_ms> | ||
| <raft_logs_level>trace</raft_logs_level> | ||
| <rotate_log_storage_interval>10000</rotate_log_storage_interval> | ||
| <rotate_log_storage_interval>10000</rotate_log_storage_interval> | ||
| </coordination_settings> | ||
|
|
||
| <raft_configuration> | ||
| <raft_configuration> | ||
| <server> | ||
| <id>1</id> | ||
| <hostname>hostname1</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>2</id> | ||
| <hostname>hostname2</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| </raft_configuration> | ||
|
|
||
| <id>1</id> | ||
| <hostname>hostname1</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>2</id> | ||
| <hostname>hostname2</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>3</id> | ||
| <hostname>hostname3</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| </raft_configuration> | ||
| </keeper_server> | ||
|
|
||
| <zookeeper> | ||
| <node> | ||
| <host>localhost</host> | ||
| <port>2181</port> | ||
| </node> | ||
| </zookeeper> | ||
|
|
||
| <distributed_ddl> | ||
| <path>/clickhouse/testcluster/task_queue/ddl</path> | ||
| </distributed_ddl> | ||
|
|
@@ -135,31 +151,28 @@ $ cat /etc/clickhouse-server/config.d/keeper.xml | |
| <operation_timeout_ms>10000</operation_timeout_ms> | ||
| <session_timeout_ms>30000</session_timeout_ms> | ||
| <raft_logs_level>trace</raft_logs_level> | ||
| <rotate_log_storage_interval>10000</rotate_log_storage_interval> | ||
| <rotate_log_storage_interval>10000</rotate_log_storage_interval> | ||
| </coordination_settings> | ||
|
|
||
| <raft_configuration> | ||
| <raft_configuration> | ||
| <server> | ||
| <id>1</id> | ||
| <hostname>hostname1</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>2</id> | ||
| <hostname>hostname2</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| </raft_configuration> | ||
|
|
||
| <id>1</id> | ||
| <hostname>hostname1</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>2</id> | ||
| <hostname>hostname2</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>3</id> | ||
| <hostname>hostname3</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| </raft_configuration> | ||
| </keeper_server> | ||
|
|
||
| <zookeeper> | ||
| <node> | ||
| <host>localhost</host> | ||
| <port>2181</port> | ||
| </node> | ||
| </zookeeper> | ||
|
|
||
| <distributed_ddl> | ||
| <path>/clickhouse/testcluster/task_queue/ddl</path> | ||
| </distributed_ddl> | ||
|
|
@@ -177,7 +190,50 @@ $ cat /etc/clickhouse-server/config.d/macros.xml | |
| </yandex> | ||
| ``` | ||
|
|
||
| ### on both | ||
| ### hostname3 | ||
|
|
||
| ```xml | ||
| $ cat /etc/clickhouse-keeper/keeper_config.xml | ||
|
|
||
| <?xml version="1.0" ?> | ||
| <clickhouse> | ||
| <keeper_server> | ||
| <tcp_port>2181</tcp_port> | ||
| <server_id>3</server_id> | ||
| <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path> | ||
| <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path> | ||
|
|
||
| <coordination_settings> | ||
| <operation_timeout_ms>10000</operation_timeout_ms> | ||
| <session_timeout_ms>30000</session_timeout_ms> | ||
| <raft_logs_level>trace</raft_logs_level> | ||
| <rotate_log_storage_interval>10000</rotate_log_storage_interval> | ||
| </coordination_settings> | ||
|
|
||
| <raft_configuration> | ||
| <server> | ||
| <id>1</id> | ||
| <hostname>hostname1</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>2</id> | ||
| <hostname>hostname2</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| <server> | ||
| <id>3</id> | ||
| <hostname>hostname3</hostname> | ||
| <port>9444</port> | ||
| </server> | ||
| </raft_configuration> | ||
| </keeper_server> | ||
| </clickhouse> | ||
|
|
||
| $ clickhouse-keeper --config /etc/clickhouse-keeper/keeper_config.xml | ||
| ``` | ||
|
|
||
| ### on both ClickHouse nodes | ||
|
|
||
| ```xml | ||
| $ cat /etc/clickhouse-server/config.d/clusters.xml | ||
|
|
@@ -213,3 +269,18 @@ insert into test select number, '' from numbers(100000000); | |
| -- on both nodes: | ||
| select count() from test; | ||
| ``` | ||
|
|
||
| ## Useful references | ||
|
|
||
| - Official Keeper guide: | ||
| https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/ | ||
| - `clickhouse-keeper-client`: | ||
| https://clickhouse.com/docs/en/operations/utilities/clickhouse-keeper-client | ||
| - Keeper HTTP API and dashboard (26.1+): | ||
| https://clickhouse.com/docs/operations/utilities/clickhouse-keeper-http-api | ||
| - `system.zookeeper_connection`: | ||
| https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection | ||
| - `system.zookeeper_connection_log`: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are also: |
||
| https://clickhouse.com/docs/en/operations/system-tables/zookeeper_connection_log | ||
| - Altinity operator CHK examples: | ||
| https://github.com/Altinity/clickhouse-operator/tree/master/docs/chk-examples | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a reminder about the other metadata?
clickhouse-keeper-converter. For example: Distributed DDL queue, RBAC data (if configured), etc.We can link Misha's article here (#165)