Considerations when choosing and configuring load balancers
  • 16 Oct 2018
  • 3 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

Considerations when choosing and configuring load balancers

  • Dark
    Light
  • PDF

Article Summary

Question:

What sort of features and functionality should be borne in mind when choosing and configuring a load balancer, either to spread the load of recursive queries between a number of recursive servers, or to provide authoritative DNS replies based on application or service availability?

Answer:

There are two main considerations. One is that the load balancer doesn't become a source of resource issues on its own (i.e. a bottleneck or the means to cascade a service outage to all the servers that it provides front-end access to). The second is ensuring that it responds properly according to DNS protocol - failure to do this can lead to server or website client access problems when recursive servers that do follow the protocol properly 'understand' as a result that a host doesn't exist or is temporarily unavailable.

Here is a list of thoughts to help you make your implementation decision, followed by details of some load balancer implementation failings that we have encountered.  

  • Many load balancers maintain a 'state' table of queries that they've passed through while they await the server response. This can lead to their becoming a bottleneck or cause of problems under a heavy load of queries that don't receive a timely response from the nameserver(s) for whatever reason.
  • BIND 9 attempts to de-duplicate queries issued by the same client so it will be more efficient if, over a period, all client queries from a specific client address are sent to the same server. (But in the case of popular queries from multiple clients, all the servers behind the LBs are going to be doing the same work and have the same cache population anyway; the LBs are simply reducing the query load per server.)
  • Some load balancers monitor the status of the DNS servers they're distributing queries to; care needs to be taken in the configuration/loading in the case of server 'failure' to avoid cascading outages due to overload of remaining servers.
  • When configuring the load balancer topology, you also need to consider the clients' list of resolvers and the most commonly seen fall-back behavior from the first-listed nameserver. This can impact the 'fall-out' pattern in case of an outage or query storm.
  • If there's a DNS proxy between the clients and the load balancers that handle the resolver fall-back instead of the client resolv.conf, then you would need to consider its tuning settings instead/too when deciding on the best settings or choice of LB boxes.
  • When planning your installation, you might want to think about implementing some volume-based packet filtering/shaping between your clients and your DNS 'solution' to provide a degree of protection for your service if there's a peak in queries for a particular name/domain or from a particular client or range of clients - especially for recursive servers where the target domain's servers are unreachable or the response is SERVFAIL.

Check that your proposed (or implemented) solution behaves correctly when responding to queries:

  • Check for responses to AAAA queries as well as A queries for names in the zones being served. Commonly seen failures are:
    • Returning NXDOMAIN instead of NOERROR when a name has an A record but not an AAAA record.
    • Returning the wrong SOA record with NXDOMAIN responses (doesn't match the zone being served).
  • Check that the servers respond correctly for other record types - such as NS, SOA, TXT and MX.
  • Confirm that CNAME records are served correctly in response to A and AAAA queries.
  • Check that the NS and SOA records provided in authority and additional sections of query replies are the correct ones for the zone of the query that is being responded to.
  • When responding authoritatively, or passing on authoritative replies, the AA (Authoritative Answer) bit should be set in the header.
  • Check all of the above over TCP as well as UDP.
  • Does the load balancer support DNSSEC?
  • Does the load balancer support EDNS0 and larger UDP packet sizes?
  • Does the load balancer allow (pass through without interference) and/or support new EDNS options (such as EDNS cookies)?