Kea Performance Optimization
  • 03 Dec 2021
  • 9 Minutes to read
  • Contributors
  • Dark
  • PDF

Kea Performance Optimization

  • Dark
  • PDF

We have had a number of requests for suggestions about how to optimize Kea performance. These suggestions are general guidance; as always, your results will depend on your hardware and software choices, the size and complexity of your configuration, and local factors such as network latency.

Actual performance test results are now contained in separate documents. See our reports on Kea 2.0, Kea 1.7.6, and 1.4.0 vs 1.5.0.

1. Choose the right lease file backend

Every lease must be written to disk. The choice of where this file is written has a major impact on performance. The default memfile backend is the fastest, able to support a maximum of >10x leases per second versus storing leases in a database. The operation of writing the lease to a database file is expensive, and caps performance at <10% of the memfile solution. PostgreSQL is the fastest database backend for leases, followed by MySQL (~75% of the throughput of PostgreSQL), with Cassandra the slowest (30% of PostgreSQL).

The values shown are synthetic performance results the ISC QA team measured in our lab. These were full four-packet transactions (DORA for DHCPv4 or SARR for DHCPv6), and Kea was responding to simulated clients in a single sufficiently large subnet, so running out of addresses was not a factor. These numbers are higher than you will see in a realistic production network and should be regarded as a theoretical maximum.

Note that using a database backend for host reservations is not as expensive as using it for a lease backend, because the query to the database for a host reservation is less expensive than the write operation to record the lease.

If you do choose to use a MySQL database backend for lease storage, follow these instructions to set the innodb_flush_log_at_trx_commit=2, which greatly improves performance.

2. Choose the right hardware

Kea itself, the DHCPv4 and DHCPv6 processes, are CPU-intensive. In all versions prior to Kea 2.0, Kea is primarily single-threaded; there are separate threads to do certain tasks, but the primary tasks (processing packets, inspecting the lease and hosts databases) are done in a single thread. Hardware with a fast clock and few cores should give better performance than many cores with a lower clock speed.
When using a database for a lease backend, the disk-access performance is the key factor. In many cases, the obvious hardware to achieve better performance is NVMe or SSD disks.

Application Limitation Optimize
DHCPv4, DHCPv6 servers CPU Fast clock speed
lease database (e.g. MySQL) lease writes Disk access speed

3. Minimize network latency between Kea components

When using Kea with a database backend of any kind, it is ideal to locate both Kea and the database on the same host to minimize latency. Given the advice above, to maximize clock speed for Kea and disk performance for the backend, you will see that you need a host with both a fast clock and high-performance storage for best performance. When using database clustering technology, it is important to measure the latency from your Kea instance to the database.

4. Help Kea find an available lease quickly

Kea uses a simple iterative process to find the next available lease. The more complex your configuration, the more structures it may have to search to find and confirm an address.

  • Use fewer subnets if you can.
  • Avoid shared lease backends. When multiple Kea servers share a single lease backend (e.g. with a cluster of databases serving as the lease backend with multiple Kea instances sharing the same pools of addresses for allocation), they will run into contention for assigning addresses. Multiple Kea instances will attempt to assign the next available address; only the first one will succeed and the others will have to retry.
  • One large pool is better than many smaller ones; try to keep pool utilization below 80%.
    Kea uses an iterative algorithm to find a new lease, trying leases one by one. This algorithm is very fast when your pool utilization is fairly small, but if most of your addresses are already assigned, Kea needs to do many tries before its allocation engine is able to find a lease that's available. If your pool utilization is over 90%, consider extending your subnet or using a shared network.
  • Avoid shared networks if you can. Shared networks is a powerful feature that lets Kea overlay more than one IP subnet on the same physical link. This is very useful, but it has significant performance implications. In many cases Kea has go through a more complex decision algorithm (akin to "am I out of addresses in this subnet? let's change to a different subnet in this shared network and try again").
  • Kea does two lease lookups: by client-id (if the client sent a client-id) and if a lease is not found, another one by MAC address. If the administrator doesn't care about client-id (which is sometimes not provided by the client) that lookup can be disabled, by using the match-client-id setting. The default value of true indicates that the server will use the “client identifier” for lease lookups and “chaddr” if the first lookup returns no results. The false means that the server will only use the “chaddr” to search for the client’s lease. Setting match-client-idto false cuts the number of searches for a lease in half and may provide a significant performance boost. See the relevant section of the documentation for more details.

5. Follow these tips for host reservations

  • Searching for possible host reservations (HR) is expensive. If you use host reservations, decide which part of your address space should be used for reserved addresses and which for dynamic assignment. When an address is considered, Kea asks the database: "is this candidate address used by anyone right now?" If the answer is no, the next question is: "is this address reserved for anyone?" You can omit the second question if you tell Kea that there are no reservations for addresses from the dynamic pool. See "reservation-mode": "out-of-pool".
  • If you don't need HR, you can tell Kea not to look for HRs at all by setting "reservation-mode": "disabled". If you are using HRs in only a few subnets, disable host reservations globally and then enable them only on a per-subnet basis.
  • Eliminate unused host identifiers. Kea is able to identify host reservations in several ways: circuit-id, MAC (hardware) address, duid, or client-id in DHCPv4, or duid or MAC (hardware) address in DHCPv6. Kea checks all of those types one by one. If you know that you don't use certain types (e.g. you always identify your hosts by MAC address only), you can tell Kea not to bother trying different identifiers, e.g. "host-reservation-identifiers": [ "hw-address" ]. If you want to use several identifier types, please order the list from most to least frequent, e.g. if you have 990 reservations by MAC address and 10 reservations by client-id, please use "host-reservation-identifiers": [ "hw-address", "client-id" ].

6. Remember that hooks add latency

Dynamically-linked Kea hooks are a powerful way to add functionality, but they can limit performance. In particular, a hook that queries an external system, such as an external database, will add latency. Depending on what the external system is, it may be a significant bottleneck. The ISC subscriber-only hook that integrates a look-up to a RADIUS database, for example, can be a significant performance drag, because RADIUS was designed for lower throughput than many administrators expect from their DHCP systems.

7. Consider the impact of high availability

With High Availability (HA) enabled, one server has to wait for its partner's confirmation before it sends a reply back to the client. This has significant performance implications.

  • Minimize the latency (round trip time) between two cooperating HA servers.

  • Active-Passive is more efficient than load-balancing.

Different modes of operation in HA may result in varying performance. There are two modes of operation for the HA enabled servers: load-balancing (a.k.a. active-active) and hot-standby (a.k.a. active-passive). In the load-balancing mode, both active servers respond to the DHCP messages sent by clients; each server processes around 50% of packets sent to the HA system. In the hot-standby mode, only one of the servers processes the received DHCP messages and responds to clients. The load-balancing mode is considered faster from the client standpoint because the load is split between two servers working in parallel. However, in the load-balancing mode both servers have to process lease updates from their partners, which has an impact on the overall performance of individual instances. In the hot-standby mode, the server designated to process the DHCP traffic does not receive lease updates from the partner, so its bandwidth for DHCP traffic processing is not impacted. The tradeoff here is that this server has to process twice as many packets, compared to the load-balancing mode.

When HA deployment uses database replication for leases synchronization, the internal HA mechanism for such synchronization must be disabled. The "send-lease-updates" boolean configuration parameter can be used to enable/disable sending the additional lease updates to other database instances. This option was designed for deployments which choose to rely on the database replication mechanisms to provide redundancy, but which want to take advantage of the Kea HA load-balancing and partner failure-detection mechanisms.

HA multi-threading update in Kea 2.0

Note that the HA hook has been refactored to remove the Control Agent (which is single-threaded) from the path in Kea 2.0. This dramatically improves HA performance.

8. Don't log at debug level in production

Hopefully, this advice is self-explanatory.

9. Note that REST operations are not "free"

Kea processes receive REST API commands in a synchronous manner. That means that for the duration of command processing, DHCP packets are not processed. This is usually not a problem for single commands, but if the command requires complex operations (such as retrieving statistics from all subnets when there are thousands of subnets configured) or there are many commands being sent frequently, this has a degrading impact on performance.

10. Don't use Client Classification expressions to enumerate a long list of specific clients

Enumerating individual clients in class test expressions can be inefficient and grows more inefficient as the number of expressions increases. If you find yourself doing this, you should use global host reservations to assign clients to classes instead.
Kea will evalute every test statement in a client classification operation before proceeding. Even after finding a match, Kea must continue evaluating subsequent statements, in case there are additional matches. Having an excessively long list of test cases in a classification statement can impact performance.

11. Avoid lengthy regular expressions

Evaluating regular expressions is expensive. If you are using the Forensic Logging feature, for example, the default log is relatively high-performing, compared to the configurable log, which requires evaluating an expression to determine what to write to the log. If you are creating a custom forensic log, use the shortest regular expression that will work for you.