Best Practices for those running Authoritative Servers


DRAFT ARTICLE "under construction"
... your feedback is encouraged and warmly welcomed!

The following is general advice for operating solely authoritative nameservers.  The items below should be considered as starting points for determining the settings and controls most appropriate for your environment, depending on your size, operational needs and security concerns.  They are not a complete and comprehensive set of recommendations for all environments.  There will be operational best practices that are not specific to DNS such as BCP38, and BCP84 that you should also consider.  A good business case for the deployment of BCP38 can be found here:  The specific DNS case is discussed in BCP 140 (RFC 5358).  See also

  • It is strongly recommended that you run BIND on a server dedicated to DNS only.  Reasons include:
    • Minimized risk of impact to DNS services as a result of other applications consuming server resources (perhaps due to an attack on those services, or due to application error).
    • Conversely, minimized risk to other applications as a result of BIND consuming all system or network resources.
    • Reduced likelihood of unauthorized access to the DNS server (e.g. via a code defect and root access exploit made possible via another application).
    • Improved ability to monitor DNS server performance (since the server is dedicated to one service).
    • Improved ability to troubleshoot problems.

  • Run BIND as an unprivileged user.

    To open low-numbered UDP and TCP ports BIND must be launched as root, but an alternate uid can be specified using the -u command line argument; after opening needed resources named will change its runtime uid to an unprivileged account. (Please see the end of this document for note (1) concerning use of this feature under Linux.)

  • If following the preceding advice (running BIND as an unprivileged user on a dedicated server) chrooting is "de-emphasized." Our operations experts feel that chrooting does not substantially improve security under those conditions and do not affirmatively recommend it, but they do not explicitly discourage it.

  • Make use of BIND access control mechanisms such as address match lists to restrict recursive query service to known and authorized clients.  Ideally your Internet-facing authoritative servers should not perform recursion for any clients at all.

  • Consider DNSSEC-signing your public authoritative zones.  (Recursive servers will then be able to use DNSSEC-validation to authenticate your records).

  • Consider deploying Response Rate Limiting (RRL).  This functionality is available in BIND from version 9.9.4 and newer.  For information on Response Rate Limiting, see: A Quick Introduction to Response Rate Limiting

  • Ensure (and confirm through testing) that your infrastructure supports EDNS0 and large UDP packet sizes.

  • Consider the length of the TTLs on the delegation records that you manage within your zones, as well as those that are provided by the parent zones that delegate authority to your nameservers.  Longer TTLs protect the visibility of a zone,
    but shorter ones allow for a faster change of nameservers.  Long TTLs can also help protect the visibility of a zone when the parent zone's nameservers are under attack.  See for more information.

  • Do not combine authoritative and recursive nameserver functions -- have each function performed by separate server sets.  This advice primarily concerns separation of public-facing authoritative services from internal client-facing recursive services - administrators may, for convenience, choose to serve some internal-only zones authoritatively from their recursive servers, having determined that the benefit outweighs any risks associated with this policy.

    If you share recursive and authoritative functions in the one server - then if there is a problem that impacts authoritative servers only - for example, that causes all of your self-authoritative servers to fail, then it will at the same time break your recursive service too.

  • Run multiple, distributed authoritative servers, avoiding single points of failure in critical resource paths. A variety of strategies are available (including anycast and load-balancing) to ensure robust geographic and network diversity in your deployment.  (Note that care should be taken with monitored load-balanced configurations to ensure that under high loads all servers are not mistakenly taken offline as the increased loading causes a decrease in their responsiveness.  This can also happen if one server in the pool genuinely fails, thus increasing the query load on the remaining servers.).

  • Provision sufficient capacity to handle burst traffic up to 20x normal level (see also the above point on load-balanced configurations - adequate overprovisioning will help to avoid some of the pitfalls).

    Remember that excess capacity must take into account not only server CPU and memory resources but also send and receive capacity along the entire network path

  • In most instances we would not recommend the use of inbound packet filtering for authoritative nameservers, Response Rate Limiting is the recommended solution.  However there are some circumstances where filtering at very high inbound packet rates can be helpful - please contact ISC if you think you might benefit from our operational experience in this area.

  • Ensure that system outbound network buffers are large enough to handle your rates of outbound query traffic.  Some OS implementations (linux particularly some versions) by default assume low rates of outbound network traffic - but an authoritative server will often be responding with significantly larger packets than the queries it received, particularly for signed zones.

  • Put in place monitoring scripts to continually check health of servers and alert if conditions change substantially.

    Conditions to monitor include:
    - process presence
    - CPU utilization
    - memory usage
    - network throughput and buffering (inbound/outbound)
    - filesystem utilization (on the log filesystem and also the filesystem containing the named working directory)

  • By design, and for security purposes, the most common mode of failure for BIND is intentional process termination when it encounters an inconsistent state. An automated minder process capable of restarting BIND intelligently is recommended if you do not have 24-hour operations support (and possibly even if you do.) It is especially helpful if any such script can checkpoint and archive the logs when this happens.
  • Logs should be examined periodically for error and warning messages which may provide a tip-off for incipient problems before they become critical.

  • Review the logging configuration to ensure it meets your requirements. BIND's logging defaults are generally sane (passing most of the work to syslog), but may not line up with organizational policy and/or desired data collection/retention standards.

  • When using size-limited files for logging, plan the size of the files and number to retain so that an increased level of logging due to a problem is unlikely to cause the logs from the start of the problem to become unavailable.  The exact settings will depend on how quickly problems can be detected and the details of the baseline retention policy.

  • Query logging adds substantial overhead (on the order of 10x) and so should not be turned on without careful consideration.

  • Prior to any trouble, ensure that a strategy is in place for collecting post-mortem information if a server does encounter a problem. This includes:
    - Building named with debug symbols enabled
    - Enabling the BIND XML statistics channel for easy data collection.
    - Designing an appropriate logging strategy and reserving sufficient space on the log filesystem for information to be collected for a significant context period before an event (several hours at least, 24 hours+ preferred.)
    - Ensuring that the uid under which named is running has write permission sufficient to write a core image to its working directory if it segmentation faults and to write named.dump or files if requested by operator.
    See What to do with a misbehaving BIND server and What to do if your BIND or DHCP server has crashed for guidance on troubleshooting problems and the type of information that is useful to collect in those circumstances.

  • Run a multi-threaded BIND build and launch named with an appropriate number of task threads tuned for the hardware and CPU architecture.

    Tuning is environment-specific

    System administrators may benefit from running tests with different values of -n (number of worker threads) and -U (from BIND 9.9 onwards - number of listening tasks per socket) to confirm the optimum tunings for their architecture and typical query profile and load.  Particularly when the number of logical CPUs exceeds the number of physical CPUs, setting -n to the number of physical CPUs may improve throughput.  From BIND 9.9 upwards, the number of listener tasks per interface defaults to -n, but administrators may see performance improvements, particularly reducing CPU overhead at the same time with a value of -U that lies between n-1 and n/2.

  • Observe query loads periodically to establish baseline expectations.  This will enable you to monitor for anything unusual - as defined by the range of 'normal' for your specific operational environment.

  • Run currently-supported version(s) of BIND in your environment. 

  • You should have a strategy that includes both a planned upgrade path to ensure that you can take advantage of improved features and functionality, as well well as how you will respond if there is a security advisory released that has the potential to impact your servers and services.  See Which version of BIND do I want to download and install? for more information.

  • The ISC BIND source includes a comprehensive unit test suite designed to test correct functional performance.  After using the configure script to tailor BIND for local conditions and desired optional components, run "make test" to verify that the build options you have selected will produce run-time code that functions correctly in your environment.

    Example for running the test suite (your configure options and choice of user privileges may not be identical):
    ./configure --with-openssl --enable-threads --with-libxml2
    sudo bin/tests/system/ up
    make test
    sudo bin/tests/system/ down

  • Our general advice for security practices is included in the list above. However many large production environments with mission-critical DNS needs may opt to run servers on multiple hardware and/or OS platforms to increase the "eco-diversity" of their DNS infrastructure.  This also includes running different versions of BIND for resilience to potential defects that may not impact all currently supported versions.

(1) Linux support for the -u feature of BIND (dropping unnecessary permissions) requires a compatible kernel version.

Per the named man page:

On Linux, named uses the kernel's capability mechanism to drop all root privileges except the ability to bind(2) to a privileged port and set process resource limits.  Unfortunately, this means that the -u option only works when named is run on kernel 2.2.18 or later, or kernel 2.3.99-pre3 or later, since previous kernels did not allow privileges to be retained after setuid(2).