BIND Best Practices - Recursive
1) Dedicate a machine
It is strongly recommended that you run BIND on a server dedicated to DNS only. Reasons include:
Minimized risk of impact to DNS services as a result of other applications consuming server resources (perhaps due to an attack on those services, or due to application error).
Conversely, minimized risk to other applications as a result of BIND consuming all system or network resources.
Reduced likelihood of unauthorized access to the DNS server (e.g. via a code defect and root access exploit made possible via another application).
Improved ability to monitor DNS server performance (since the server is dedicated to one service).
Improved ability to troubleshoot problems.
Do not combine authoritative and recursive name server functions on the same server -- have each function performed by separate server sets.
This advice primarily concerns separation of public-facing authoritative services from internal client-facing recursive services - administrators may, for convenience, choose to serve some internal-only zones authoritatively from their recursive servers, having determined that the benefit outweighs any risks associated with this policy.
If you do share recursive and authoritative functions in the one server - then if there is a problem that impacts authoritative servers only - for example, that causes all of your self-authoritative servers to fail, then it will at the same time break your recursive service too.
2) Ensure the network is ready
Ensure (and confirm through testing) that your infrastructure supports EDNS0 and large UDP packet sizes. See How to verify a clean network path for DNS resolution by recursive servers
Disable the use of stateful firewalls/packet filters on your servers for outbound query traffic (iterative queries made by a recursive server to authoritative Internet servers). Administrators often consider the impact of stateful firewalls and load balancers on inbound client queries, but then fail to consider their impact on resolver query traffic.
3) Basic Defensive Measures
Do not operate an open resolver; open resolvers will be discovered and co-opted for use in DNS reflection attacks against third parties. Use BIND access control mechanisms such as address match lists to restrict recursive query service to known and authorized clients.
Ensure that you have query port randomization enabled. A useful testing tool provided by the Domain Name System Operations Analysis and Research Center can be found here: https://www.dns-oarc.net/oarc/services/porttest
Run BIND as an unprivileged user.
To open low-numbered UDP and TCP ports BIND must be launched as root, but an alternate uid can be specified using the
-ucommand line argument; after opening needed resources named will change its runtime uid to an unprivileged account.
Configure your recursive servers to use DNSSEC validation (this is the default behavior). DNSSEC-validation will prevent cache-poisoning of records that are provided by DNSSEC-signed authoritative zones.
4) Design for resilience
Run multiple, distributed recursing resolvers, avoiding single points of failure in critical resource paths.
A variety of strategies are available (including anycast and load-balancing) to ensure robust geographic and network diversity in your deployment. Those for whom high availability of DNS service is particularly critical may also wish to consider diversity of nameserver software versions and code base (e.g. running at least two different major versions of BIND on their servers, as well as DNS server software from other vendors).
Be sure to run currently-supported version(s) of BIND in your environment. See Which version of BIND do I want to download and install? for further discussion of this.
5) Provision adequate capacity
Provision sufficient capacity to handle burst traffic up to at least 150% of normal level (see also the above point on load-balanced configurations - adequate overprovisioning will help to avoid some of the pitfalls).
Remember that excess capacity must take into account not only server CPU and memory resources but also send and receive capacity along the entire network path.
Ensure that system outbound network buffers are large enough to handle your rates of outbound query traffic. Some OS implementations (linux particularly some versions) by default assume low rates of outbound network traffic - but a recursive DNS server will have significant volumes of outbound traffic, both in responding to client queries, and in handling iteration on cache-misses.
In general BIND sets reasonable default limits on most options, but the default value for cache size is 90% of system physical memory (on servers that support detection of physical memory - otherwise
unlimited). Be aware that that the same automatic max-cache-size is set for each view if named is configured to run with multiple views and caches. Set an appropriate limit on max-cache-size to avoid growth without limit, but also to provide sufficient capacity for a good cache hit rate on client queries. Additionally, provision enough system memory to allow storage of other BIND structures in addition to the resolver cache.
6) Establish monitoring
Put in place monitoring scripts to continually check health of servers and alert if conditions change substantially. See this article for more detailed recommendations on monitoring.
Conditions to monitor include:
network throughput and buffering (inbound/outbound)
filesystem utilization (on the log filesystem and also the filesystem containing the named working directory)
query types, answer types
cache hit rate
Examine logs periodically for error and warning messages which may provide a tip-off for incipient problems before they become critical.
Review the logging configuration to ensure it meets your requirements. BIND's logging defaults are generally sane (passing most of the work to syslog), but may not line up with organizational policy and/or desired data collection/retention standards.
When using size-limited files for logging, plan the size of the files and number to retain so that an increased level of logging due to a problem is unlikely to cause the logs from the start of the problem to become unavailable. The exact settings will depend on how quickly problems can be detected and the details of the baseline retention policy.
Query logging adds substantial overhead (on the order of 10x) and so should not be turned on without careful consideration. When query logging is required, use of dnstap can minimize the cost.
By design, and for security purposes, the most common mode of failure for BIND is intentional process termination when it encounters an inconsistent state. An automated minder process capable of restarting BIND intelligently is recommended if you do not have 24-hour operations support (and possibly even if you do.) It is especially helpful if any such script can checkpoint and archive the logs when this happens.
7) Protect your users
When it was first introduced, DNS response re-writing was regarded as tantamount to 'lying' about the DNS. However, years later, this technique has become very widely deployed to replace answers that would direct users to malware or phishing sites with benign answers, or no response. Response Policy Zones, or RPZ, is not enabled by default. RPZ requires a data feed, effectively the equivalent of a spam blocklist, from a data provider. Several providers offer free community blocklist subscriptions in addition to premium services. If you enable RPZ it is important to whitelist your own zones to ensure you never inadvertentl block access to them.
8) Prepare to troubleshoot
Prior to any trouble, ensure that a strategy is in place for collecting post-mortem information if a server does encounter a problem.
- Building named with debug symbols enabled
- Enabling the BIND XML statistics channel for easy data collection.
- Designing an appropriate logging strategy and reserving sufficient space on the log filesystem for information to be collected for a significant context period before an event (several hours at least, 24 hours+ preferred.)
- Ensuring that the uid under which named is running has write permission sufficient to write a core image to its working directory if it segmentation faults and to write named.dump or named.run files if requested by operator.
See What to do with a misbehaving BIND server and What to do if your BIND or DHCP server has crashed for guidance on troubleshooting problems and the type of information that is useful to collect in those circumstances.
Observe query loads periodically to establish baseline expectations. This will enable you to monitor for anything unusual - as defined by the range of 'normal' for your specific operational environment.
9) Maintain and update
Subscribe to the firstname.lastname@example.org mailing list to stay informed of updates and security issues. All ISC mailing lists are available at https://www.isc.org/mailinglists/.
You should have a strategy that includes both a planned upgrade path to ensure that you can take advantage of improved features and functionality, as well well as how you will respond if there is a security advisory released that has the potential to impact your servers and services. See Which version of BIND do I want to download and install? for more information.