Load balancing and DNS

There are two scenarios where DNS and load balancing may be mentioned in the same sentence:

Incoming queries need to be distributed across a pool of DNS servers. This is load balancing of the DNS service itself.
Use of DNS responses to influence clients to send subsequent requests for other types of traffic to one of a number of application servers; e.g. web or mail. This is the use of DNS to load balance other applications.

Why do load balancing?

Typical reasons for wanting to distribute incoming requests across multiple servers include:

Load: one server cannot handle all requests, so multiple servers, all doing the same job, are needed.
Resilience: ensuring that the service is still available if one of the servers dies or is unreachable for some reason.
Maintenance: being able to take a server offline for scheduled work, such as software upgrades, without affecting service to the clients.
Growth: adding more capacity for future expansion with minimum disruption.

Load balancing versus load sharing

Load balancing is usually described as the process of distributing traffic fairly between a number of application servers, often based on their ability to handle it. Thus each server is loaded roughly equally, even if they have different power, CPU, memory etc. This requires careful monitoring of each server and a definition of what exactly constitutes load. For DNS, for example, one measure of load might be the number of queries per second being sent to a server.

Load sharing is usually described as the process of farming out traffic to a number of servers without knowing in advance how busy those servers are: thus there could be a traffic imbalance between them. This is why anycast is a load sharing, not a load balancing technique because network devices have a route, or they don't. There is no metric to say in what proportions different routeing entries should be used. UCMP - Unequal Cost Multi Path (see ECMP, below) - routeing protocols would help here. But so far, none exist.

Load balancing/sharing of DNS

There are two common methods used to spread client queries across a bank of DNS servers:

Install a proprietary load balancer in front of them.
Anycast. This is more load sharing than load balancing.

Taking these in order:

Hardware load balancers (also some software load balancers)

Many different brands are available, but their basic operation is very similar.

Application servers (in this case DNS) are grouped into a pool with its own IP address - known as a VIP - which is what is given to clients.
The load balancer receives requests to the VIP, selects a pool member, translates the destination address to that of the chosen member and forwards the traffic.
NAT is reversed for the reply.

One job of the load balancer is to poll each member of the pool regularly, to see if it is still functioning and capable of handling requests: These polls are usually known as health checks. If a member is not available, the load balancer removes it from the pool temporarily and continues distributing traffic across the remaining members.

Anycast

This takes advantage of the connectionless nature of the Internet Protocol. Every packet is treated as an independent entity and routers direct packets to their destination based on the states of their routeing tables at that moment. A consequence of this is that different routers may direct traffic for the same destination address to different places as their routeing tables may differ.

Anycast works by having each DNS server itself advertise exactly the same address into the network and letting the network figure out how to route traffic to that address. To do this advertising an extra piece of software is usually required that runs the dynamic routeing protocol needed by the network devices.

In effect, the VIP, which is external to the servers if a hardware load balancer is deployed, is now internal to the servers themselves and the network routers themselves are performing the load balancing.

However, the application servers now have an extra job to do, which is to health check their own service and withdraw the VIP from being advertised if it should not be working for some reason, so that network routers can adjust their routeing tables. If this is not done, traffic risks being blackholed.

Another requirement of anycast is that the chosen routeing protocol must support ECMP. This is because multiple servers in the same location, each advertising the same address, will all want to receive traffic. In practice this limits the choice of protocol to one of OSPF, BGP or IS-IS.

What is ECMP?

ECMP stands for Equal Cost MultiPath and is a technique available in some dynamic routeing protocols to allow a router to install multiple routes with different next hops into its table and to share traffic across them. This is useful in an anycast DNS (or other protocol) setup because it permits more than one application server to receive traffic simultaneously, allowing for local redundancy and ease of scaling for increased capacity.

Load balancing/sharing other application protocols using DNS

General

There are a few ways to achieve some degree of load balancing or sharing using DNS responses, depending on the application traffic that needs to be balanced and, in some cases, capabilities of the clients to understand and act on what they are being told.

One simple method, which can be problematic, is to return multiple address records for a given name query. This forces clients to make a choice of which address to use for a specific connection. Since it relies on clients making that choice, this is not load balancing but load sharing, at best. The zone file snippet below shows a mail server name defined with three different IP addresses: the use of mail for this example is continued later.

		IN	MX	10	mail.example.com.
	....
	mail    IN	A	192.0.2.4
		IN	A	192.0.2.5
		IN	A	192.0.2.6

What we see here is a mail server for this zone (which could be example.com) called mail.example.com. This server either has three interfaces, or there are three servers with (in this case) consecutive IP addresses.

A mail client will query for the MX record, which returns the server FQDN mail.example.com. It then queries for the address record(s) for the FQDN returned in the MX response, which returns the three IPv4 addresses shown.

Some problems with this approach are:

If there are multiple answers of the same type in a DNS response, clients typically (but not always) choose the answer that comes first. If answers were always given in the same order this would place undue strain on that server, leaving the rest idle. However, most DNS software will return multiple answers in a random order (BIND does this by default, though it can be changed in configuration), which will help with fairness of traffic distribution.
Sometimes clients do not accept multiple answers in a single response, leading to failure.
Clients have no knowledge of whether a given address is reachable (from their starting point), or if it is, whether the application they are trying to reach is running at the time. Thus there is no resilience or guaranteed failover, unless implemented in the client application itself.

An improvement on this is to return only a single answer to the client: one that has been chosen to be available and have capacity. Thus client choice is removed from the equation and control of traffic distribution is maintained by the network operator. This is not a standard feature of most DNS server software, so to achieve this usually requires the introduction into the network of specialist "intelligent" DNS servers that can monitor service availability, in the same way as dedicated load balancers, and provide tailored DNS responses.

DNSSEC considerations

If such an "intelligent" load balancer is going to return different answers for the same name at different times and clients and/or their resolver wish to validate the answers, each of these name/address responses will need a unique RRSIG record. The load-balancer could generate these on the fly. Or it might be able to pre-compute them and store them alongside each possible response. Either way, it would be important that the answer and its RRSIG have the same TTL, so that they both expire together and validators don't get confused which signature to apply to what answer.

Email

For email communication DNS has the MX record, which provides clients with one or more FQDNs, each with an associated preference value. Clients (MTAs) making MX queries are written to understand what these priorities mean and how to use them, thus they are more predictable than other clients using address records directly. Each FQDN may resolve to multiple address records, which leads to a choice and possible lack of resilience, as noted above. Equal cost load balancing is built into the protocol, wherein if multiple answers are returned, all with the same preference value, mail will be shared between the destinations referenced. Failover is also built into the protocol, wherein if the mail client cannot connect to the server/servers referenced by the best (lowest) preference it will try one with a worse (higher) preference, if available.

MX preference is more like a cost

Although the section in RFC 1035 that defines the MX record format calls the numeric value "preference", this could be a bit confusing because it is natural to think of higher preferences being better, whereas here, lower preferences are better.
It can be more useful to think of this value instead as a cost to reach the server in question, with lower cost servers being more preferable.

The zone file snippet below shows three MX records, all for the name of the zone - which might be example.com - all with the default zone TTL and all with the same preference (10). A mail client receiving these records would load balance mail traffic across all three (sets of) servers; mail, mail1 and mail2.

	IN	MX	10	mail.example.com.
	IN	MX	10	mail1.example.com.
	IN	MX	10	mail2.example.com.

Other applications

The principle of the MX record was extended for other application protocols by introduction of the SRV (SeRVice) record. Similarly to the MX record it returns one or more FQDNs, each with a priority value.

priority or preference?

The RFC defining the SRV record uses the term priority for the numerical value representing the importance of a record, whereas the equivalent value in an MX record is called preference (see above). In both cases, a client should choose answers with a lower value.

With SRV, each answer also includes additional weight and port number parameters. Clients, rather than asking for a name directly, request details of a service at a domain, referenced by service type and transport protocol. SRV responses contain the FQDN of a server/servers capable of providing that service, which port it runs on, a priority (which works exactly like the preference in MX records) and a weight, which allows clients to share their requests unequally across multiple destination endpoints and thus achieve a degree of unequal load balancing. However, only a few types of application support the use of the SRV record, one such being Microsoft Active Directory. No web browser (to date) supports them.

Other techniques

A more recent RFC introduces the SVCB and HTTPS records, which extend the functionality of the SRV record even further. An article discussing these in more detail is available here.

ECS (EDNS Client Subnet) can be used to distribute requests from a diverse client population across a geographically spread set of resources based on the clients' locations, derived by knowing some information about their sources addresses. It is more usually used to direct clients to local instances of a service, for speed of response and to avoid unnecessay backhaul traffic As such it is more of a load distribution tool, but warrants a mention here for completeness.