Getting started with Galera or Percona for Kea

This document provides instructions for setting up a working database cluster for use as a Kea lease backend, or any other type of backend.

The database can be a single point of failure

Be aware that currently Kea can connect to a single database host, making it a single point of failure. Follow the progress of the multiple contact points issue that will mitigate this issue.

Interested in Anycast?

Many of the points discussed in the Kea development wiki article on Anycast HA considerations apply to cluster backend setups as well.

Cluster solutions

The clustered backend will have to replace two built-in features of the Kea solution: lease updates and lease syncing. MySQL has two popular extensions for clustering: Galera and Percona XtraDB, which is a fork of Galera. Both of them have a number of features that make them work with Kea:

active-active failover
All participating nodes respond to requests, similar to Kea HA load-balancing, as opposed to active-passive which is more similar to Kea HA hot-standby.
multi-master replication
Data is replicated to all client-facing nodes; each of these nodes is sufficient in coordinating data consistency across the cluster and does not need secondary nodes to achieve this.
virtually synchronous replication
Uses row logging to quickly log writes, but on read they get flushed to persistent storage such that all writes are totally ordered between the nodes and all reads see the same consistent view of the data.
replication factor == node count
All nodes are self-sufficient to answer any request and they contain data received through all the nodes.
quorum
During operation, the cluster keeps track of the responsive nodes which form the primary component. When one of the nodes unexpectedly goes down, if the nodes that remain form a majority quorum relative to the initial cluster size, then they keep functioning. If not, the primary component is disbanded and no nodes will operate until a majority is formed again. If there is a network partition, the same majority quorum principle applies. If the partition splits the cluster in exactly half, to avoid split-brain, neither of the halves will be operational since they don't satisfy quorum.
effortless node opt-out
If a node leaves the cluster unexpectedly, it could mean that a network partition occurred. In this case some nodes may still offer service leading to a split-brain scenario. This is why a quorum needs to be instituted. When a node leaves the cluster gracefully, this is a guarantee that the node is offline and not receiving statements. So instead of only the primary component becoming smaller as was the case for unexpected communication cut-offs, the entire cluster size decreases. This increases the tolerance to subsequent node failures since a smaller cluster size requires fewer nodes to be online as part of the quorum. It's also convenient for maintenance or upgrades. To upgrade a drive you can simply stop the node, swap the drives, start the node.
automatic recovery
When a node comes back up, it replicates data from the functioning nodes and then becomes available itself. Data conflicts are not expected in this scenario.

Additional Percona features

In addition to the features above, Percona has these additional features.

XtraBackup
XtraBackup is an SST tool that is aware of database entities like tables and rows (as opposed to rsync which is byte by byte replication). It can be useful in reconstructing damaged databases. It needs to be explicitly configured.
Specific threading model
To improve scalability, Percona uses different threads for different purposes. According to the Percona documentation these include worker threads (wsrep_slave_threads), the rollback thread, the service thread, gcs_recv_thread, the gcomm connection thread, donor/joiner/pagestore threads and others.
More monitoring
Instruments are exposed in the PERFORMANCE_SCHEMA for the user to monitor.
Better security
If SSL is used, Percona encrypts the stored data.

Installing the packages

These instructions were verified using the following versions:

Debian 10.8: Galera 25.3.25-2, Percona 8.0.22-13
Ubuntu 20.04.2 LTS: Galera 25.3.29-1, Percona 8.0.22-13
Fedora 33: Galera 10.4.18-1, Percona 8.0.22-13

For Galera:

# deb
$ apt install -y galera-3 mariadb-server

# RPM
$ yum install -y mariadb-server-galera

For Percona:

# deb
$ wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
$ dpkg -i ./percona-release_latest.$(lsb_release -sc)_all.deb
$ apt update
$ apt install -y percona-xtradb-cluster-server-5.7

# RPM
$ yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
$ yum module disable mysql
$ percona-release setup ps80
$ yum install -y Percona-Server-server-57

Setting up a cluster

Setting up the cluster is done in a similar way for both Galera and Percona.

Retrieve some values needed in the configuration:

configuration location

# deb
$ config=/etc/mysql/mariadb.conf.d/60-cluster.cnf

# RPM
$ config=/etc/my.cnf.d/cluster.cnf

outside-facing interface address

$ this_node_address=$(ip a s $(ip route | grep -E '^default' | grep -Eo 'dev [a-z0-9]*' | cut -d ' ' -f 2) | grep --color=auto -F ' inet ' | tr -s ' ' | cut -d ' ' -f 3 | cut -d '/' -f 1)
$ first_node_address= ... # Chose a first node to act as the cluster initializer and get its address.

the Galera library

For Galera:

# deb
$ wsrep_provider=$(dpkg-query -L galera-3 | grep -E 'libgalera_smm\.so$' | head -n 1)

# RPM
$ wsrep_provider=$(yum repoquery --installed -l galera | grep --color=auto --color=auto -E 'libgalera_smm\.so$' | head -n 1)

For Percona:

# deb
$ wsrep_provider=$(dpkg-query -L percona-xtradb-cluster-server-5.7 | grep -E 'libgalera_smm\.so$' | head -n 1)

# RPM
$ wsrep_provider=$(yum repoquery --installed -l Percona-Server-server-57 | grep --color=auto --color=auto -E 'libgalera_smm\.so$' | head -n 1)

Create a minimal configuration.

For all nodes, run the following command:

$ cat > "${config}" <<HERE_DOCUMENT
[mysqld]
binlog_format=ROW
innodb_autoinc_lock_mode=2
wsrep_cluster_address=gcomm://${first_node_address}:4567
wsrep_cluster_name=MyCluster
wsrep_node_address=${this_node_address}
wsrep_on=ON
wsrep_provider=${wsrep_provider}
HERE_DOCUMENT

Initialize the first node.

To start the cluster, you must initialize the first node differently from the subsequent nodes. The first node needs to have a special cluster address wsrep_cluster_address=gcomm://. Making it refer to itself would make it complain that it can't connect to the cluster. Make sure to go to your first node and make this change.

This can be accomplished with this command:
sed -i 's#^wsrep_cluster_address=.*#wsrep_cluster_address=gcomm://#g' "${config}".

(Subsequent nodes wsrep_cluster_address should point to the first node.)

Then proceed to initialize the cluster and start MariaDB on the first node:

$ systemctl restart mysql
$ mysql --execute='SHOW GLOBAL STATUS' | grep -E 'wsrep_cluster_size|wsrep_local_state_comment|wsrep_ready'
wsrep_cluster_size      1
wsrep_local_state_comment       Synced
wsrep_ready     ON

Start the cluster on the other nodes.

$ systemctl restart mariadb
$ mysql --execute='SHOW GLOBAL STATUS' | grep -E 'wsrep_cluster_size|wsrep_local_state_comment|wsrep_ready'
wsrep_cluster_size      2
wsrep_local_state_comment       Synced
wsrep_ready     ON

As you start up nodes, wsrep_cluster_size should increase, showing the number of nodes that have joined the cluster so far.

Testing that Replication Works

Create a database.

On any node:

$ for statement in                                                            \
  "CREATE USER IF NOT EXISTS 'keatest' IDENTIFIED BY 'keatest';"            \
  'GRANT ALL ON *.* TO keatest;'                                            \
  'CREATE DATABASE IF NOT EXISTS keatest;'                                  \
  "CREATE USER IF NOT EXISTS 'keatest_readonly' IDENTIFIED BY 'keatest';"   \
  'GRANT SELECT ON keatest.* TO keatest_readonly;'                          \
  ; do
  mysql --execute="${statement}"
done

Initialize tables.

$ kea-admin db-init mysql

Insert a dummy lease.

$ mysql --database=keatest --execute="INSERT INTO lease6 (address, subnet_id, lease_type) VALUES ('2001:db8::1', 1, 1)"

Retrieve it on another node.

$ mysql --database=keatest --execute='SELECT * FROM lease6'
address duid    valid_lifetime  expire  subnet_id       pref_lifetime   lease_type      iaid    prefix_len      fqdn_fwd        fqdn_rev        hostname        hwaddr  hwtype  hwaddr_source   state   user_context
2001:db8::1     NULL    NULL    2021-03-16 15:47:42     1       NULL    1       NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    0       NULL

As you can see, DDL statements as well as DML and DQL are synchronized across the cluster.

Using docker

Galera

Run the first node.

$ docker network create 'galera-network' --subnet=10.100.0.0/16
$ docker run -d \
  -e 'MARIADB_GALERA_CLUSTER_NAME=MyCluster' \
  -e 'MARIADB_GALERA_MARIABACKUP_USER=keatest' \
  -e 'MARIADB_GALERA_MARIABACKUP_PASSWORD=keatest' \
  -e 'MARIADB_ROOT_PASSWORD=keatest' \
  -e 'MARIADB_USER=keatest' \
  -e 'MARIADB_PASSWORD=keatest' \
  -e 'MARIADB_DATABASE=keatest' \
  -e 'MARIADB_REPLICATION_USER=keatest' \
  -e 'MARIADB_REPLICATION_PASSWORD=keatest' \
  --name 'galera' \
  --network 'galera-network' \
  -p '3306:3306' \
  -p '4444:4444' \
  -p '4567:4567' \
  -p '4568:4568' \
  -v '/tmp/mysql:/opt/bitnami/mariadb/tmp' \
  'bitnami/mariadb-galera:latest'

Run all the other nodes.

$ for i in $(seq 2 ${node_count}); do
  p=$((i - 1))
  docker run -d \
    -e 'MARIADB_GALERA_CLUSTER_NAME=cluster' \
    -e 'MARIADB_GALERA_CLUSTER_ADDRESS=gcomm://mysql:4567,0.0.0.0:4567' \
    -e 'MARIADB_GALERA_MARIABACKUP_USER=keatest' \
    -e 'MARIADB_GALERA_MARIABACKUP_PASSWORD=keatest' \
    -e 'MARIADB_ROOT_PASSWORD=keatest' \
    -e 'MARIADB_REPLICATION_USER=keatest' \
    -e 'MARIADB_REPLICATION_PASSWORD=keatest' \
    --name "galera-${i}" \
    --network 'galera-network' \
    -p "${p}3306:3306" \
    -p "${p}4444:4444" \
    -p "${p}4567:4567" \
    -p "${p}4568:4568" \
    -v "/tmp/mysql:/opt/bitnami/mariadb/tmp" \
    'bitnami/mariadb-galera:latest'
done

If you're running all the docker containers on the same machine, make sure to not overwrite /tmp/mysql directories. That's where your unix socket will reside. For the subsequent nodes, you can provide /tmp/mysql-${i}.

Now you can connect with:

$ mysql --user=root --password=keatest --socket=/tmp/mysql/mysqld.sock

Percona

Generate an SSL certificate.

By default, the docker image built by Percona requires SSL encryption for inter-node traffic.

$ mkdir -p /tmp/percona-xtradb-cluster.conf.d/{cert,config}
$ cat > /tmp/percona-xtradb-cluster.conf.d/config/custom.cnf <<HERE_DOCUMENT
[mysqld]
ssl-ca = /cert/ca.pem
ssl-cert = /cert/server-cert.pem
ssl-key = /cert/server-key.pem

[client]
ssl-ca = /cert/ca.pem
ssl-cert = /cert/client-cert.pem
ssl-key = /cert/client-key.pem

[sst]
encrypt = 4
ssl-ca = /cert/ca.pem
ssl-cert = /cert/server-cert.pem
ssl-key = /cert/server-key.pem
HERE_DOCUMENT
$ docker run \
    --name pxc-cert \
    -v '/tmp/percona-xtradb-cluster.conf.d/cert:/cert' \
    'percona/percona-xtradb-cluster:latest' mysql_ssl_rsa_setup -d /cert

Start the first node.

$ docker network create 'percona-network' --subnet=10.200.0.0/16
$ docker run -d \
    --cap-add sys_nice \
    -e 'CLUSTER_NAME=cluster' \
    -e 'MYSQL_ROOT_PASSWORD=keatest' \
    --name 'percona' \
    --network 'percona-network' \
    -p '3306:3306' \
    -p '4444:4444' \
    -p '4567:4567' \
    -p '4568:4568' \
    -v '/tmp/mysql:/tmp' \
    -v '/tmp/percona-xtradb-cluster.conf.d/cert:/cert' \
    -v '/tmp/percona-xtradb-cluster.conf.d/config:/etc/percona-xtradb-cluster.conf.d' \
    'percona/percona-xtradb-cluster:latest'

Start the other nodes.

$ for i in $(seq 2 ${node_count}); do
    p=$((i - 1))
    docker run -d \
        --cap-add sys_nice \
        -e 'CLUSTER_JOIN=mysql' \
        -e "CLUSTER_NAME=cluster" \
        -e 'MYSQL_ROOT_PASSWORD=keatest' \
        --name "percona-${i}" \
        --network 'percona-network' \
        -p "${p}3306:3306" \
        -p "${p}4444:4444" \
        -p "${p}4567:4567" \
        -p "${p}4568:4568" \
        -v '/tmp/mysql:/tmp' \
        -v '/tmp/percona-xtradb-cluster.conf.d/cert:/cert' \
        -v '/tmp/percona-xtradb-cluster.conf.d/config:/etc/percona-xtradb-cluster.conf.d' \
        'percona/percona-xtradb-cluster:latest'
done

Now you can connect with:

$ mysql --user=root --password=keatest --socket=/tmp/mysql/mysqlx.sock

Kea database connection recovery is useful here

You might be interested in enabling the database reconnection feature for Kea so that Kea can reconnect to a database node that has freshly returned after a failure. See Kea documentation for how to enable it.

Test results

Testing verified that Galera & Percona clusters work correctly for Kea lease allocation.

Our tests were conducted with a cluster of 3 nodes.
Each node had one Kea server connected to it. We sent requests at a slow rate of 1 request per second. Each request came from a different client, identified by the client ID / DUID.

Because Kea allocates leases sequentially, Kea selected the same address for multiple clients for the first exchange. The lease is not committed with an UPDATE to the database until the second exchange. As you can see our test showed that there are no duplicates in the second exchanges. The allocation resulting in the UPDATE is retried with a different address until it succeeds. Since that is the address that the client will use thereafter, it all works out.

### 3 x Galera nodes ###

# 3 x kea-dhcp4 servers
Duplicates for DISCOVER-OFFER (119/356, 33% with 99% appearance)
No duplicates for REQUEST-ACK (0/0, 0%).

# 3 x kea-dhcp6 servers, IA_NA
Duplicates for SOLICIT-ADVERTISE (118/355, 33% with 99% appearance)
No duplicates for REQUEST-REPLY (0/355, 0%).

# 3 x kea-dhcp6 servers, IA_PD
Duplicates for SOLICIT-ADVERTISE (118/355, 33% with 99% appearance)
No duplicates for REQUEST-REPLY (0/355, 0%).

### 3 x Percona nodes ###

# 3 x kea-dhcp4 servers
Duplicates for DISCOVER-OFFER (118/352, 33% with 99% appearance)
No duplicates for REQUEST-ACK (0/0, 0%).

# 3 x kea-dhcp6 servers, IA_NA
Duplicates for SOLICIT-ADVERTISE (117/347, 33% with 98% appearance)
No duplicates for REQUEST-REPLY (0/347, 0%).

# 3 x kea-dhcp6 servers, IA_PD
Duplicates for SOLICIT-ADVERTISE (114/231, 49% with 98% appearance)
No duplicates for REQUEST-REPLY (0/231, 0%).

Appearance percentage is (number of duplicate allocations / number of distinct leases * 100) % capping at (number of clients * 100)%. So a 33% lease duplication with a 99% appearance means that a third of the leases were duplicate and approximately all of them were allocated by approximately all of the 3 clients. Tests were also carried out at increased request rates to stress the Kea + cluster system, and there were still no duplicates in the second exchange.