Knowledge Base ISC Main Website Ask a Question/Contact ISC
Best Practices for those running Recursive Servers
Author: ISC Support Reference Number: AA-00874 Views: 20613 Created: 2013-03-08 13:16 Last Updated: 2013-12-27 18:27 0 Rating/ Voters

DRAFT ARTICLE "under construction"
... your feedback is encouraged and warmly welcomed!

The following is general advice for operating solely recursive (i.e. "caching-only") nameservers.  The items below should be considered as starting points for determining the settings and controls most appropriate for your environment, depending on your size, operational needs and security concerns.  They are not a complete and comprehensive set of recommendations for all environments.  There will be operational best practices that are not specific to DNS such as BCP38, and BCP84 that you should also consider.  A good business case for the deployment of BCP38 can be found here: ftp://ftp.ripe.net/ripe/docs/ripe-432.pdf.  The specific DNS case is discussed in BCP 140 (RFC 5358).  See also http://archive.icann.org/en/committees/security/sac004.txt.

  • It is strongly recommended that you run BIND on a server dedicated to DNS only.  Reasons include:
    • Minimized risk of impact to DNS services as a result of other applications consuming server resources (perhaps due to an attack on those services, or due to application error).
    • Conversely, minimized risk to other applications as a result of BIND consuming all system or network resources.
    • Reduced likelihood of unauthorized access to the DNS server (e.g. via a code defect and root access exploit made possible via another application).
    • Improved ability to monitor DNS server performance (since the server is dedicated to one service).
    • Improved ability to troubleshoot problems.

  • Run BIND as an unprivileged user.

    To open low-numbered UDP and TCP ports BIND must be launched as root, but an alternate uid can be specified using the -u command line argument; after opening needed resources named will change its runtime uid to an unprivileged account. (Please see the end of this document for note (1) concerning use of this feature under Linux.)

  • If following the preceding advice (running BIND as an unprivileged user on a dedicated server) chrooting is "de-emphasized." Our operations experts feel that chrooting does not substantially improve security under those conditions and do not affirmatively recommend it, but they do not explicitly discourage it.

  • Do not operate an open resolver, as any such are likely to be discovered and co-opted for use in DNS reflection attacks against third parties. Make use of BIND access control mechanisms such as address match lists to restrict recursive query service to known and authorized clients.

  • Configure your network boundary routers/firewalls to drop packets entering it from the outside that claim to be from addresses inside of it. Among other things, this prevents someone outside your network from using your recursive servers as a reflector in an attack targeting your own systems.

  • Ensure that you have query port randomization enabled (it is enabled by default on all currently-supported versions of BIND - but make sure that you have not overridden it by specifying a specific source port for named to use when sending queries to authoritative servers).  A useful testing tool provided by the Domain Name System Operations Analysis and Research Center can be found here: https://www.dns-oarc.net/oarc/services/porttest

  • We recommend configuring your recursive servers to use DNSSEC validation.  DNSSEC-validation will prevent cache-poisoning of records that are provided by DNSSEC-signed authoritative zones.

  • Consider deploying Response Rate Limiting (RRL).  This functionality is currently available as unsupported patches - but will be incorporated in future versions of BIND.  Although originally intended for use on authoritative servers, RRL is helpful if you have clients that can be used indirectly to attack third parties - for example via open forwarding DNS servers provided on some home CPE equipment.  For more information on RRL see: http://www.redbarn.org/dns/ratelimits

  • Do not leak RFC 1918 zone queries to the Internet nameservers.  See What does "RFC 1918 response from Internet for 0.0.0.10.IN-ADDR.ARPA" mean?

  • Ensure (and confirm through testing) that your infrastructure supports EDNS0 and large UDP packet sizes.  See How to verify a clean network path for DNS resolution by recursive servers

  • Do not combine authoritative and recursive nameserver functions -- have each function performed by separate server sets.  This advice primarily concerns separation of public-facing authoritative services from internal client-facing recursive services - administrators may, for convenience, choose to serve some internal-only zones authoritatively from their recursive servers, having determined that the benefit outweighs any risks associated with this policy.

    If you share recursive and authoritative functions in the one server - then if there is a problem that impacts authoritative servers only - for example, that causes all of your self-authoritative servers to fail, then it will at the same time break your recursive service too.

  • Run multiple, distributed recursing resolvers, avoiding single points of failure in critical resource paths. A variety of strategies are available (including anycast and load-balancing) to ensure robust geographic and network diversity in your deployment.  Those for whom high availability of DNS service is particularly critical may also wish to consider diversity of nameserver software versions and code base (e.g. running at least two different major versions of BIND on their servers, as well as DNS server software from other vendors)  See Which version of BIND do I want to download and install? for further discussion of this.  (Note that care should be taken with monitored load-balanced configurations to ensure that under high loads all servers are not mistakenly taken offline as the increased loading causes a decrease in their responsiveness.  This can also happen if one server in the pool genuinely fails, thus increasing the query load on the remaining servers.).

  • Provision sufficient capacity to handle burst traffic up to at least 150% of normal level (see also the above point on load-balanced configurations - adequate overprovisioning will help to avoid some of the pitfalls).

    Remember that excess capacity must take into account not only server CPU and memory resources but also send and receive capacity along the entire network path

  • Disable the use of stateful firewalls/packet filters on your servers for outbound query traffic (iterative queries made by a recursive server to authoritative Internet servers).  Administrators often consider the impact of stateful firewalls and load balancers on inbound client queries, but then fail to consider their impact on resolver query traffic.

  • Ensure that system outbound network buffers are large enough to handle your rates of outbound query traffic.  Some OS implementations (linux particularly some versions) by default assume low rates of outbound network traffic - but a recursive DNS server will have significant volumes of outbound traffic, both in responding to client queries, and in handling iteration on cache-misses.

  • Put in place monitoring scripts to continually check health of servers and alert if conditions change substantially.

    Conditions to monitor include:
    - process presence
    - CPU utilization
    - memory usage
    - network throughput and buffering (inbound/outbound)
    - filesystem utilization (on the log filesystem and also the filesystem containing the named working directory)

  • By design, and for security purposes, the most common mode of failure for BIND is intentional process termination when it encounters an inconsistent state. An automated minder process capable of restarting BIND intelligently is recommended if you do not have 24-hour operations support (and possibly even if you do.) It is especially helpful if any such script can checkpoint and archive the logs when this happens.
  • Logs should be examined periodically for error and warning messages which may provide a tip-off for incipient problems before they become critical.

  • Review the logging configuration to ensure it meets your requirements. BIND's logging defaults are generally sane (passing most of the work to syslog), but may not line up with organizational policy and/or desired data collection/retention standards.

  • When using size-limited files for logging, plan the size of the files and number to retain so that an increased level of logging due to a problem is unlikely to cause the logs from the start of the problem to become unavailable.  The exact settings will depend on how quickly problems can be detected and the details of the baseline retention policy.

  • Query logging adds substantial overhead (on the order of 10x) and so should not be turned on without careful consideration.

  • Prior to any trouble, ensure that a strategy is in place for collecting post-mortem information if a server does encounter a problem. This includes:
    - Building named with debug symbols enabled
    - Enabling the BIND XML statistics channel for easy data collection.
    - Designing an appropriate logging strategy and reserving sufficient space on the log filesystem for information to be collected for a significant context period before an event (several hours at least, 24 hours+ preferred.)
    - Ensuring that the uid under which named is running has write permission sufficient to write a core image to its working directory if it segmentation faults and to write named.dump or named.run files if requested by operator.
    See What to do with a misbehaving BIND server and What to do if your BIND or DHCP server has crashed for guidance on troubleshooting problems and the type of information that is useful to collect in those circumstances.

  • In general BIND sets reasonable default limits on most options, but the default value for cache size is "unlimited." Set an appropriate limit on max-cache-size to avoid growth without limit (the maximum configurable value is (currently) 2 ^ 32 - 1 bytes, but it is possibly to configure named to run without a limit on cache size, in which case its use can exceed 4Gb).   Additionally, provision enough system memory to allow storage of other BIND structures in addition to the resolver cache.

  • Run a multi-threaded BIND build and launch named with an appropriate number of task threads tuned for the hardware and CPU architecture.
     

    Tuning is environment-specific

    System administrators may benefit from running tests with different values of -n (number of worker threads) and -U (from BIND 9.9 onwards - number of listening tasks per socket) to confirm the optimum tunings for their architecture and typical query profile and load.  Particularly when the number of logical CPUs exceeds the number of physical CPUs, setting -n to the number of physical CPUs may improve throughput.  From BIND 9.9 upwards, the number of listener tasks per interface defaults to -n, but administrators may see performance improvements, particularly reducing CPU overhead at the same time with a value of -U that lies between n-1 and n/2.

  • Observe query loads and cache utilization periodically to establish baseline expectations.  This will enable you to monitor for anything unusual - as defined by the range of 'normal' for your specific operational environment.

  • Run currently-supported version(s) of BIND in your environment. 

  • You should have a strategy that includes both a planned upgrade path to ensure that you can take advantage of improved features and functionality, as well well as how you will respond if there is a security advisory released that has the potential to impact your servers and services.  See Which version of BIND do I want to download and install? for more information.

  • The ISC BIND source includes a comprehensive unit test suite designed to test correct functional performance.  After using the configure script to tailor BIND for local conditions and desired optional components, run "make test" to verify that the build options you have selected will produce run-time code that functions correctly in your environment.

    Example for running the test suite (your configure options and choice of user privileges may not be identical):
    ./configure --with-openssl --enable-threads --with-libxml2
    make
    sudo bin/tests/system/ifconfig.sh up
    make test
    sudo bin/tests/system/ifconfig.sh down

  • Regarding performance testing: we find it unrealistic to recommend a general performance testing strategy because each organization has a differing mix of operational criteria. Real-world testing of recursive server performance is very complicated because of the difficulty of isolating variables contributed by network conditions and foreign server behavior. We believe that for recursive servers, observation and monitoring of servers in operation yields a more accurate characterization of their performance characteristics.

  • Our general advice for security practices is included in the list above. However many large production environments with mission-critical DNS needs may opt to run servers on multiple hardware and/or OS platforms to increase the "eco-diversity" of their DNS infrastructure.  This also includes running different versions of BIND for resilience to potential defects that may not impact all currently supported versions.

(1) Linux support for the -u feature of BIND (dropping unnecessary permissions) requires a compatible kernel version.

Per the named man page:

On Linux, named uses the kernel's capability mechanism to drop all root privileges except the ability to bind(2) to a privileged port and set process resource limits.  Unfortunately, this means that the -u option only works when named is run on kernel 2.2.18 or later, or kernel 2.3.99-pre3 or later, since previous kernels did not allow privileges to be retained after setuid(2).



© 2001-2017 Internet Systems Consortium

For assistance with problems and questions for which you have not been able to find an answer in our Knowledge Base, we recommend searching our community mailing list archives and/or posting your question there (you will need to register there first for your posts to be accepted). The bind-users and the dhcp-users lists particularly have a long-standing and active membership.

ISC relies on the financial support of the community to fund the development of its open source software products. If you would like to support future product evolution and maintenance as well having peace of mind knowing that our team of experts are poised to provide you with individual technical assistance whenever you call upon them, then please consider our Professional Subscription Support services - details can be found on our main website.

Feedback
  • There is no feedback for this article
Quick Jump Menu