BIND 9.16 and newer are able to take advantage of kernel load-balancing of server sockets on systems which support it, including Linux (SO_REUSEPORT) and FreeBSD (SO_REUSEPORT_LB). This is enabled by default, and changes the way in which inbound client packets are distributed to server threads for processing.
Starting with BIND 9.16.28 and BIND 9.18.2, we provided a configuration option
reuseport to disable this new mode of inbound client packet handling and revert to the traditional poll-based method of distributing incoming client packets to available threads for handling.
Inbound client packet processing with
In older versions of BIND, listener threads, when available to handle new inbound client traffic, would poll the server sockets listening on port 53. They would then attempt to read from the sockets that had waiting packets.
If you want to visualise this, think of a BIND server as being a set of service desks for which there is a single queue. When an agent becomes free, they flag that they can handle a new customer, and call one over from the central queue.
A customer at the head of the queue may actually be called by multiple agents who all become available at the same time, so only one agent 'gets' this customer on this call. However, the queue is constantly moving, and customers are only sent to agents who are available to handle them right away.
This is a less efficient method for handling and distributing inbound i/o amongst a set of processing threads. In basic (and simple authoritative-only server) tests we observed both that overall packet-handling capacity was lower than with
reuseport yes; and that sometimes client traffic was not evenly distributed amongst the packet-handling threads.
Kernel load-balancing of inbound client packets with
reuseport yes; instructs the kernel (on systems that support this feature) to distribute incoming socket connections amongst the networking threads based on a hashing scheme that takes into consideration the client source IP address and (ideally also) the client source port, although some older network interface cards (NICs) don't support this. For more information, see the receive network flow classification options (rx-flow-hash) section in the ethtool manual page. The default is yes.
To visualise this, now consider the same set of service desks, but now each has its own independent queue. As clients arrive, a concierge is instructing them to join a specific agent queue, using a selection system that should evenly distribute clients between the queues.
Enabling reuseport significantly increases general throughput when incoming traffic is distributed uniformly onto the threads by the operating system.
Assuming that the server's network cards support the full functionality of rx-flow-hash, then the hash algorithm uses both client source IP address and client source port to calculate which thread the packet should be sent to for handling. This is not going to work as intended in situations where all traffic comes from a single IP address and source port, or if the network card only supports hash based on address and not source port, and where there is a significant skew with a large proportion of traffic coming from a very small number of clients.
Observations from more complex server environments
Whilst making use of kernel load-balancing of inbound client traffic is more efficient for evenly distributing the inbound traffic-handling workload, this strategy cannot take into account the availability or "busy-ness" of the threads to which it distributes packets.
Now we have a new visualisation for you - imagine that one of the service desks agents has been interrupted by an important phone call and temporarily pauses processing of their queue. The concierge is still apportioning new customers, oblivious to the fact that one of the queues isn't moving. Most phone calls are short, and all of the agents are receiving them. But the fluctuating queue lengths means that although the aggregate processing throughput rate is higher with kernel load-balancing enabled, and additionally therefore that the overall average time to be processed is lower too, some individuals will have a worse experience than with the single-queue system if they happened to be at the back of a (temporarily) much longer per-agent queue.
Since the behaviour of a BIND server is highly dependent on its configuration, environment and specific query load, we strongly recommend testing and evaluating for yourself which setting of the option
reuseport delivers the best results for your own operation and target metrics.
reuseport yes; be operationally better for a BIND server?
- Most standard resolver-only environments with both a substantial cache and with a wide range of different clients all making a variety of different queries.
- Authoritative servers with relatively static zones, or whose zones are usually only updated in small increments.
reuseport no; be operationally better for a BIND server?
- DNS server environments that need to prioritise consistent short client RTTs over overall faster throughput.
- Complex authoritative server environments with a high rate of zone updates, particularly where many zone updates are of substantial incremental size.
- Complex resolver server environments using frequently-updated Response Policy Zones.
- DNS server enviroments that have intermittant 'big' background tasks, such as Catalog Zone.