• Print
  • Share
  • Dark
    Light

Kea 1.5 Preliminary Performance Testing

  • Updated on 13 Feb 2019
  • 13 minutes to read
  • Contributors

Introduction

This report presents a series of measurements on Kea 1.5.0, taken using ISC's performance laboratory (perflab). The measurements were carried out in late December 2018/early January 2019.

Configuration

  1. perfdhcp and the Kea server were each running on their own machine. Each machine was a Dell R430 server, with a 128 GB RAM, single Xeon CPU E5-2680 v3 @ 2.50GHz, Intel X710 10 Gbps NIC for test traffic. The CPU is 12-core, but hyperthreading is disabled (after experiments to improve the repeatability of results). The systems are running Fedora 27.
  2. Kea was set up with a single subnet. For IPv4, the pool was set to some 262,000 addresses. For IPv6, it was set to over 4 billion addresses. Since the maximum lease rate observed is about 8,000 leases/second and each test run for 30 seconds, this is sufficient to ensure that the pool is never filled during a test (see next section for the definition of a test).
  3. Shared subnets were not used.
  4. No host reservations were defined, either in the configuration file or in the database.
  5. Congestion control was disabled.
  6. For the tests with the database backends, both Kea and the database server were running on the same machine. (Performance may be very different if the database and Kea are not running on the same host. Also, the overall performance may be much higher if more than one Kea instance is connected to the same database.)

Method

All tests were run against Kea 1.5.0 (git commit a0d3d9729b506eaa4674e5bd8b25b87d84d2492d).

  1. The server was started before each measurement run began and stopped after it ended. Within each run ten tests were carried out, with each test consisting of running perfdhcp for 30 seconds and recording the results. Kea was not restarted between tests in a run. At least three measurement runs were done for each request rate, the mean of all the results being used as the lease rate.
  2. perfdhcp was set up to simulate one million clients.
  3. As perfdhcp, Kea's DHCP performance measurement tool, is not able to automatically home in on maximum performance of the system under test, the rate was manually set, and each rate was the subject of at least three measurement runs.
  4. Kea 1.5.0 introduced the congestion handling feature. For the measurements reported here, it was disabled.

Specific points to note (and the reason why this is a preliminary report):

  1. The rate at which packets were being sent to the server was calculated from the perfdhcp output. (The current version of perfdhcp has a bug by which it sends requests at a rate below that specified on the command line. It does, however, log the number of packets sent during the test, and this is the information used to calculate the packet rate.)
  2. The MySQL, PostgreSQL, and Cassandra databases were started before a run and stopped afterwards. After being started, the database schema was set up from scratch. Before each test in a run, the lease tables were cleared. With the memfile backend, the lease file (if present) was deleted before each run, so the first test started off with an empty lease database. Unlike the database backends, for various reasons the memfile lease tables were not cleared between the tests in the run. As such, the results for the database backends are for the issuing of new leases, whereas those for memfile are effectively for lease renewals (in that a request for a client for a lease will cause Kea to find that a lease has already been assigned to that client). It is thought that this will cause the lease rates reported for these backends to be slightly different to what they would be for new lease requests.
  3. The first version of the software used for this report does not ignore the first test in each run. It has been found that in some cases, the first test within a run can have a lease rate significantly different than that reported for other tests. It is conjectured that this is due to the Kea server and/or database server doing the initial allocation of appropriate data structures, but the reason is not known for certain.

Results

For each combination of Kea protocol (DHCPv4 and DHCPv6) and backend, the following graphs are presented:

  1. Lease rate (the rate at which the server hands out leases, measured in leases/second) against the request rate (DISCOVER packets/second for DHCPv4, SOLICIT packets/second for DHCPv6). This is effectively the performance of the server.
  2. Failed requests as a percentage of requests sent as a function of the request rate. In other words, the percentage of DISCOVER or SOLICIT packets sent to the server that did not result in a lease being allocated.
  3. Initial exchange drop fraction as a function of the request rate. This is the percentage of packets in the first handshake that got no response, i.e. for DHCPv4, the percentage of DISCOVER packets that did not result in an OFFER being received. (For DHCPv6, this is the number of SOLICIT packets that did not result in a received ADVERTISE.)
  4. Confirmation exchange drop fraction as a function of the request rate. This is the percentage of packets in the second handshake that got no response, i.e. for DHCPv4, the percentage of REQUEST packets that did not result in an ACK being received. (For DHCPv6, this is the number of REQUEST packets that did not result in a received REPLY.)
  5. Initial exchange round-trip time as a function of the request rate. This is the interval in milliseconds between a sending a DISCOVER or SOLICIT and receiving the corresponding OFFER or ADVERTISE.
  6. Confirmation exchange round-trip time as a function of the request rate. This is the interval in milliseconds between sending a REQUEST and receiving the corresponding ACK or REPLY.

(In the illustrations that follow, these graphs are in order of top to bottom, left to right.)

Instead of using the rate specified to perfdhcp, the request rate is calculated from the report of packets sent by the program in the 30 seconds of the test. (The version of perfdhcp used for the test did not always send packets at the requested rate.) The request rate in the graphs is binned in intervals of 10 packets/second to make the graphs smoother. For the packet rates in each in the bin, the mean is plotted; the band around the line represents the 95% confidence interval.

MySQL

This section presents results for Kea running with a MySQL backend, version 5.7.21. As noted above, the database was initialized from scratch at the start of each run, and cleared between tests.

DHCPv4
image.png

Here, the lease rate increases more or less linearly with an increasing request rate up to about 500 to 510 leases/second. (The binning of different measurements into buckets of 10 requests/second is responsible for the apparent contradiction at some points of on the graph of the lease rate being slightly higher than the request rate.) At this point, Kea is saturated and the lease rate flattens. As the request rate approaches this value, the number of failed requests starts increasing. This graph is more or less linear as would be expected; if Kea is servicing requests at the same rate, as the request rate rises, the excess packets are dropped and contribute to this graph.

The bulk of the packets dropped are the initial DISCOVER packets. Interestingly, the fraction of the REQUEST packets sent that do not receive a reply is much lower than the fraction of DISCOVER packets not receiving a response. This may be due to the fact that with a DISCOVER, the candidate address has to be selected and so Kea has to go through the full allocation procedures. With a REQUEST, Kea just has to check that the address is still free (which, with the size of the pools it will be) and assign it. In other words, REQUEST processing is usually much faster than DISCOVER processing.

The "knee" on the "Confirmation Exchange Drop Fraction" graph that plots the fraction of OFFER packets lost may not be real, but a measurement fluctuation due to relatively few packets lost in this way. However, it does appear on the graph for DHCPv4 used with Cassandra.

The final pair of graphs show that the round-trip time of packets sharply increases as Kea reaches saturation. Again, this is expected. While Kea is processing requests faster than they arrive, a received packet will get processed almost immediately. When Kea starts falling behind, packets will be queued in the system's receive buffer and so will have to wait before being processed. At saturation, the queue will be full all the time, so every packet will have to wait for Kea to process a queue-length's worth of packets before it is processed, which explains the relatively constant RTT at high packet rates. The transition occurs where Kea is processing packets almost as fast as they arrive. In this case, the length of the queue will gradually build up and round-trip times will increase. Each tests is only 30 seconds long, so there will be some measurements where the queue is not full for a significant part of the measurement period; this accounts for the fact that the RTT gradually increases rather than there being a step increase.

DHCPv6
image.png

The graphs for Kea/DHCPv6 are similar, although there are a number of differences:

  1. The peak lease rate is higher, reaching around 530-540 leases/sec (as compared to Kea/DHCPv4's peak value of about 500 to 510 leases/second).
  2. The behavior once the request rate increases beyond the peak is different; instead of the lease rate flattening out, the lease rate starts to decline under higher load.
  3. The largest fraction of lost requests is due to the initial exchange not being completed. (Again, it is thought that this is an artifact of perfdhcp.)

PostgreSQL

This section presents results for Kea running with a PostgreSQL backend, version 9.6.6. As mentioned earlier, the database was initialized from scratch at the start of each run, and cleared between tests.

DHCPv4
image.png

As noted, this is a preliminary report and measurements are still underway. The lease rate is linearly increasing as the request rate increases, although it is showing signs of tailing off at around 770 to 780 leases/second. This is reinforced by the graphs showing packet loss and RTT which, although not having reached their peak, are starting to rise significantly.

DHCPv6
image.png

The lease rate has reached a maximum of around 800 leases/second, although there are not enough measurements to show how the server behaves as it gets saturated (i.e. whether it flattens off or the performance drops). As with other configurations, the packet-loss rate and round-trip time start to rise sharply as this limit is approached.

Cassandra

This section presents results for Kea running with a Cassandra backend, version 3.11.3. (This was running with the ccp-driver V2.9 and Java 8.) Like the other databases, the database was initialized from scratch at the start of each run, and cleared between tests.

DHCPv4
image.png

The graphs show no real surprises, being very similar for those of MySQL. The maximum lease rate is in the region of 182 to 187 leases/second although, as with other backends, the number of lost requests starts to rise as soon as the saturation level is reached.

DHCPv6
image.png

No results are presented for Kea6 running with Cassandra because of some very strange behavior exhibited during the tests. The above graph, showing a number of runs of perfdhcp with Kea6 and a Cassandra backend, illustrates this. In the runs plotted, nothing was changed between them: the same version of the software was run and the same query rate was specified for perfdhcp. Before every run the database was started and initialized; between tests within the run, the database was cleared; and after the run was complete, the database was stopped. The results, however, are distinctly bimodal: the reported performance for this lease rate is either around 230 leases/second or just over 40 leases/second, with little variance in the each result. There appears to be no pattern as to why the system locks into one mode or the other.

What makes this behavior puzzling is that if it is a Cassandra problem, why is it not seen with Kea4/Cassandra? If it is a Kea problem, why is it not seen with Kea6 running with other database backends? At any rate, until this is resolved, it is felt that it is not possible to give reliable performance figures for this combination.

Memfile (no persistence)

This section presents results for Kea running with a memfile backend, but not persisting the leases to disk. In this configuration, the data will be lost as soon as Kea is restarted or reconfigured. This scenario is run mostly for internal purposes, as a benchmark for internal packet processing and allocation engine efficiency. It does not reflect most realistic deployments.

For various reasons, unlike the database backends, the internal memory database was not cleared between tests in a run. This means that the measurements presented here are essentially those of lease renewals, rather than lease grants. A future report will present results with the database cleared between each test.

DHCPv4
image.png

At the time of writing, the maximum throughput for Kea-dhcp4 running with a memfile backend with no persistence had not been reached. All that we can say is that the peak rate will be above 8,500 leases/second. The graphs for dropped packets appear a bit noisy, but that is because to the absolute figures are very low: 0.022% of 8,500 packets/sec is under 2 packets/second.

DHCPv6
image.png

As with the V4 case, the current measurements have not yet found the maximum lease rate; all we can say is that the maximum rate is above 8,500 leases/second. The figures for failed requests are still low, although higher than for the V4 case. What is odd is the shape of the failed request curve, which reaches a peak at a request rate of about 7,600 requests/second, then tails off. The reason is unknown although it could be an artifact of the measurement harness.

Memfile (persistence)

This section presents results for Kea running with a memfile backend and persisting leases to a file on disk. Like the non-persistent case, the database was not cleared between tests in a run, so the figures are essentially those for lease renewals.

DHCPv4
image.png

It is clear from the graphs that the maximum lease grant rate had not been reached; all we can say is that it is above 7,000 leases/second. There appears to have been some oddity in the measurements about the 6,500 requests/second mark, where the failed request rate peaked. As with other measurements where the failed request rate rises then falls, this may well be an artifact of the framework. It is entirely possible that something was happening on the perflab systems/network at the time those measurements were taken; a check will be made as to whether the increased packet-loss rate in the measurements of the various configurations occurred around the same date/time.

DHCPv6
image.png

In contrast with the V4 case, the V6 results show a definite maximum lease renewal rate. The lease rate raises linearly with request rate until it reaches 6,600 leases/second; beyond this, the number of failed requests start to rise dramatically. Like the DHCPv6 MySQL backend case, once the maximum lease rate has been reached, increasing the request rate causes the performance to drop. This drop is associated with an increase in the packet-loss rate and round-trip times.

One noticeable feature about the graphs is that they are far noisier than the other configurations. This is undoubtedly associated with the interaction between the intervals between the request rates used to generate a measurement and the binning of these for the graphs. In the case of the backend databases, the interval in request rate between successive measurements was of the order of the bin size; for the memfile measurements, the interval was far higher.

Was this article helpful?