Collecting client queries for DNS server testing

Introduction

Testing DNS servers, particularly Recursive Servers (Resolvers) involves many variables which need to be taken into account. The best that you can do is to approximate the scenario in which a real production server would find itself. The results obtained from testing different scenarios or with different data might not apply to behaviour and throughput of a production server with different configuration, environment, or query patterns. The test results might also change over time as environment on the Internet evolves. This is inevitable because live servers are interacting with other DNS servers over the Internet, and the responsiveness and availability of these cannot be guaranteed.

Also adding to the unpredictability is the sequencing of query responses, response times, and population of cache. The same query in three different test runs could result in a cache hit, a cache miss needing just a single fetch, or a cache miss needing several fetches.

This ISC Blog covers the challenges in more detail
https://www.isc.org/blogs/bind-resolver-performance-july-2021/

However, it's still worth attempting to test - and most usefully, testing to compare the same client query stream against different versions of BIND, or possibly same version of BIND with different configuration.

Server operators, you will want to sample your own live client queries and use those for testing against your test environments

It's also worth trying different query streams against the same version of BIND.

Server operators, you might want to compare your client queries from 12 months ago with what you're receiving now, or to compare a 'standard' query stream against a capture of queries received that appear to be causing problems for your server(s).
DNS software providers will be interested in client query streams from different customer enviroments so that when they tune (or suggest tunings) of their resolver software, they're doing so, based on real production data, not on synthesised query sets

This article shares the recipe we use (and suggest to others) for effectively capturing client query streams whilst also randomising the client sources so that the data can be shared safely with other organisations without exposing client IP addresses.

The DNSCAP network capture utility

We use (and recommend) the DNSCAP network capture utility designed specifically for DNS traffic, maintained and made available for download and use by DNS-OARC (Domain Name System Operations Analysis and Research Center).

You can obtain the tool here. There are packaged as well as 'build your own' versions available:
https://www.dns-oarc.net/tools/dnscap

Using DNSCAP

These are the main options that you may/will use:

-z host IP address of the DNS resolver uses to receive client queries, duplicate -z if it has more IP addresses - this is crucial to filter out queries from BIND itself to the Internet - we don't want those.

-i if network interface name receiving client queries ("any" also works, so you may not need to bother with explicit names, but see notes below)

-p ask for interface not to be put into promiscuous mode, it's not needed as we only want to capture only the traffic directed to this server

-s i use just 'i' (initiations) to capture only queries but not replies (thus making the output file smaller). NOTE: this has to be combined with -z above

-w base dump the captured packets to successive binary files in standard pcap(3) format. Each file will have a name like "%s.%s.%06u" where the first %s is base, second %s is the time as hours, minutes and seconds (%H%M%S), and %06u is the microseconds.

-C lim Maximum individual file size in bytes, 1 GiB recommended

-k 'zstd -9' compression command; this is our suggestion but feel free to change it if you'd prefer something different; not required

-B datetime start capture time
-E datetime stop capture time

-S print statistics to stderr when the packet capture file is closed (optional)

-6 enable/fix IPv6 support, omit for dnscap version 2.0.0 and newer

-T selects and includes DNS TCP packets (in addition to DNS over UDP)

-P ... This loads the plugin (see example below) for anonymizing IPv6 and also IPv4 addresses using a random AES key; the key is forgotten when the process exits

A good sample size is 10 hours but shorter samples can be also useful; it is possible when testing to combine samples from more sources/sites etc.

Bonus points if you can get the capture command running in parallel on multiple servers, e.g. on 10 servers for 1 hour, or 5 servers for 2 hours, etc.

====

Here's an example of what other sites have used before:

dnscap \
-z 192.0.2.1 \
-z 2001:db8::1 \
-i any \
-p \
-s i \
-w /output/pcap \
-C 1073741824 \
-k 'zstd -9' \
-B '2022-01-01 11:40:00' \
-E '2022-01-01 21:40:00' \
-S \
-6 \
-T \
-P /usr/lib/dnscap/anonaes128.so \
    -4 \
    -K /dev/urandom \
    -I /dev/urandom

Obviously the -z settings will be customised to the environment in which dnscap will be run - and you can specify as many -z options as you need to cover all of the server IP addresses on which queries arrive)

Also don't forget to edit the start and stop capture times!

Note 1

-B and -E are the timestamps for when you want this to be active and capturing. This will result in the production of as many files of maximum size -C (1 GiB) as are needed to accommodate all of the captured and anonymised queries. If you're worried about potential disk space problems (are you?), we'd suggest a dry run over a short period to get a feel for what the output quantity is going to be like.

As a guideline, the big servers at a significantly large ISP who captured data using this method produced 2 to 6 files per server, running for 1 hour (they sampled 10 servers, 1 hour from each).

Warning

If you're going to run this on multiple servers (and plan to combine the data from each of them), then don't use:

-K /dev/urandom \
-I /dev/urandom

You will need the same anonymisation key on all the servers - so do something like:

-k putrandomkeyhere \
-i putrandomkeyhere

Where 'putrandomkeyhere' is a 16-character random string that you use on all the servers.

You can also use a filename to hold the key - like this:

-K /tmp/k \
-I /tmp/k

If you have a dedicated interface name for inbound client queries (versus needing to specify interface and IP addresses, then you can short-cut the -z, -i part of this and just use the -i.

For example, replace:

-z 192.0.2.1 \
-z 2001:db8::1 \
-i any \

with

-i yourinterfacename

Security Note if you are submitting packet captures to ISC for testing or problem replication

ISC can't reverse your client anonymisation without the key you used - we just want you to use the same one for any/all servers in each capture run. If we did have the key, we'd only be able to reverse the anonymisation for the v6 clients anyway. (The reason is that AES-encrypted IPv4 address is 128 bit long and it gets truncated to 32 bits to fit it back into IPv4 address field in the PCAP, i.e. 3/4 of bits is lost, ruining all decryption attempts.)