What to do if your BIND, Kea DHCP, Stork, or ISC DHCP server has crashed
  • 26 Jan 2024
  • 8 Minutes to read
  • Contributors
  • Dark
    Light
  • PDF

What to do if your BIND, Kea DHCP, Stork, or ISC DHCP server has crashed

  • Dark
    Light
  • PDF

Article Summary

If your BIND 9, Kea DHCP, Stork, or ISC DHCP server crashes (i.e. the daemon terminates unexpectedly), collecting the evidence available and submitting it to us is vital if we are to help diagnose the problem and provide a solution. Below is a list of files/information to collect after a crash. This includes the process core dump - if you don't have one, this may need to be enabled on your system so that one can be collected if the event happens again.

Things to check if a core file does not get generated:

  • Do you have enough available disk space? Core files can be quite large.
  • Do you have the correct permissions that allow a core file to be created in the process's working directory?
  • Is there already a directory or file named "core" in the working directory? If a file named "core" already exists but has multiple hard links the kernel will not dump a core file.
  • What is the current file size limit?
  • Have you used ulimit -a to display a list of current system limits and ulimit -c unlimited to temporarily set the "core file size" value to "unlimited"? Make sure you do not have ulimit -c 0 in any shell startup scripts, e.g., .bashrc. Other files that may change ulimit include /etc/profile and /etc/init.d/functions.
  • Have you verified that /etc/security/limits.conf enables core dumps globally?
  • Are you running BIND as a non-root user? In most Linux distributions, core file creation is disabled by default for non-root users. When running BIND as a non-root user (-u), use sysctl to allow core dumps:
sysctl kern.sugid_coredump=1

(for Debian, use the /proc/sys/fs/suid_dumpable file).

  • Are you running chrooted? You can test whether or not core dumps are possible by using gcore or kill -6 (sigabrt) against a process at a time when restarting it will not impact production.

You can check the details of enabling core files for your environment via the man pages:

man core

Information and files to collect and preserve

Please collect and preserve the details below (in particular the logs and any core files which may be lost if not collected and copied/moved elsewhere right away). We may need some or all of them to diagnose the problem.

  1. Note down and tell us what BIND 9, Kea, Stork, or DHCP was doing at the time - for example, is this a test or a production environment? If testing, how is the test being run, etc.? If in production, was there anything specific happening at the time such as configuration updates or similar maintenance activities?

  2. Look for a core dump. Generally this will be in either the current working directory of the server program, or in the directory of the binary. Depending on how core dumping is managed on that system, it will simply be named core, or it may have a more esoteric name, possibly involving the PID of the process that died.

    You can check where a core has come from by using the file command:

file core

core: ELF 32-bit MSB core file, SPARC, version 1 (SYSV), SVR4-style, from 'named'

On MacOS, core files are in directory /core and suffixed with the pid of the aborting process (e.g., /core/core.<pid>).)

On some systemd-based Linux distributions, systemd-coredumpd is enabled by default, which automatically collects and stores coredumps when an application crashes. When systemd-coredump is enabled, you can list the collected core dumps by running coredumpctl list, and a core dump can be saved, if available, by running coredumpctl -o ./named.core dump /usr/sbin/named, where -o specifies the output file, and the last argument is the binary of the relevent application.

  1. Send us the actual binary that generated the core in order to be able to read it on another machine.

    Note: If the binary was built without any debug information - particularly if it doesn't have the procedure names available for stack traces - you can sometimes get around this by building a new binary with compile option -g from the same source code bundle. This implies all other build/configure options are the same as the original build, and that the compiler/linker/optimizer software hasn't been updated in the build environment since the original binary was created.

  2. Collect any libraries that the binary loaded dynamically from the run-time environment that produced the core.

    Use ldd for a first-pass on what's needed (sometimes we need more libs than ldd shows us at first; they're only exposed when we try to read the core for the first time - the most important lib that is usually omitted and that we have to request afterwards is libpthread):

ldd /usr/sbin/named

     linux-vdso.so.1 =>  (0x00007fff11d4c000)
     libcrypto.so.0.9.8 => /usr/lib64/libcrypto.so.0.9.8 (0x00007f19d4d19000)        
     libc.so.6 => /lib64/libc.so.6 (0x00007f19d49c0000)        
     libdl.so.2 => /lib64/libdl.so.2 (0x00007f19d47bc000)        
     libz.so.1 => /lib64/libz.so.1 (0x00007f19d45a6000)        
     /lib64/ld-linux-x86-64.so.2 (0x00007f19d5096000) 
There is no ldd command in MacOS

Instead, otool -L should provide the same functionality; for example:

$ otool -L /usr/local/sbin/named
  1. Note the environment information - OS and version, hardware, number of CPUs, memory size, BIND/DHCP/Kea version, and configure/compile/link options. For BIND, named -V; for DHCP, dhcpd --version; and for Kea, kea-dhcp4 -V.

  2. Please also collect the following files:

  • Configuration files: named.conf / dhcpd.conf / kea.conf (as appropriate). Key material can be obscured if you prefer.
Use named-checkconf to obscure your key material from named.conf

The named-checkconf tool has an option -p that outputs the configuration file in canonical format (after checking it). A second option -x obscures secrets by replacing them with strings of question marks ("?"). This means that the following command parses and outputs your named.conf in a format suitable for sharing without exposing key material:

$ named-checkconf -px

The filename to be checked defaults to /etc/named.conf; to explicitly apply named-checkconf to a configuration file in another location, use the following format:

$named-checkconf -px filename

If you normally run named chrooted, you may need additional permissions to run named-checkconf as well as making sure you specify the full path to the named.conf file that is used when you launch named.

  • Any include files that are declared in the main configuration files (look for nested includes too!).
  • The leases file (for ISC DHCP/Kea only, or possibly a database dump for Kea; see the kea-admin(8) manual page).
  • Zone data files (for BIND only) - although these may not be needed initially, so please ask.
  1. Gather all the relevant logfiles leading up to and covering the period of the incident.

  2. Think about what might have changed in your environment prior to experiencing this problem. For example:

    • Software upgrade (your BIND 9, Kea DHCP, Stork, or ISC DHCP software)
    • Operating system patches
    • Configuration changes (OS- or product-related)
    • Loading changes - additional clients, for example
    • Networking changes - server moves, network topology changes, etc.
    • Firewall updates or configuration changes
    • Seasonal usage changes

  3. If the failures are recurring, is there any pattern to them? For example:

    • Do they occur at the same time of week/day/hour each time? (If so, what else is going on at this time?)
    • Do they always occur at server-loading peaks?
    • Are they related to a specific server operation such as a zone transfer (BIND 9) or omapi update (ISC DHCP)?
    • Do they always occur at a specific time (days/hours) since last restarting?
    • Are they increasing or decreasing in frequency?

What to include when reporting the problem

If you are submitting a bug report or support ticket, it is vital that you collect and preserve as much information and as many as possible of the files requested above. It may not be possible to diagnose the cause of the problem if this level of information is not available to us.

However, in the initial report, we prefer just to receive the basic details, and will let you know what else we need once we've reviewed them. Please include these basics with any initial contact:

  • Environment information (see items 5 and 8, above), including the BIND 9/ISC DHCP/Kea server version.
  • Frequency and impact to your servers or production services of the incident(s) you have experienced or are continuing to experience.
  • What named or dhcpd or the relevant modular kea-* process was doing at the time (see item 1 above).
  • Extracts from system and application logfiles leading up to and including the incident/crash.
  • If at all possible, a debugger backtrace from the crash from the server that produced it.

If you have a core file and gdb on the server that produced the core, you can obtain a backtrace (a snapshot of what each thread was doing and the nested layers of procedure calls and data that was being used/accessed at the time) by launching gdb as follows:

$ <path-to>/gdb <binary(full path)> <core(full path)>

And then from gdb, type:

> thread apply all bt full

(Sometimes this will return an error and fail to complete - if that happens, retry it, omitting 'full'.)

Please include all of the output from gdb from when it is launched, not just the output from the thread apply all bt full command.

Note:

MacOS users whose system has not generated a core file may still have access to useful crash data: go to Applications/Console and look at "User Diagnostic Reports," and you may see a report.

Uploading large files to ISC

We prefer to receive large and non-text files which have been compressed and bundled first using tar and gzip/bzip. Windows users can use zip instead. Please contact us for alternative upload arrangements if you need to submit large files with bug or support tickets as these cannot be accepted as email attachments.

Bug reporting

Before submitting a bug report please ensure first that you are running a current version. Also, some packaged versions of BIND 9, Kea DHCP, and ISC DHCP might be built with source code that has been modified by the distributor. In those cases, while it may be useful to report the issue directly to ISC (particularly if it might be a potential security issue), we may not be able to successfully diagnose the root cause of the problem if it cannot be reproduced with binaries that have been built directly from source code downloaded from ISC.

To report a bug, please use our Bug Report Form.

ISC support customers:

If you have a support subscription with us, please contact us first through our support portal, rather than filing a bug report.

Reporting security issues

DNS and DHCP security are critical to the Internet infrastructure. If you think you may be seeing a potential security vulnerability (for example, a crash with REQUIRE, INSIST, or ASSERT failure), please visit the Reporting Security Vulnerabilities page on our website for instructions; do not post about it on any public mailing list. Please also see our Security Vulnerability Disclosure Policy for details on how we publish security vulnerabilities.