Linux connection tracking and DNS

Question:

My busy Linux-based nameserver is giving unreasonably slow responses. How do I know if Linux connection tracking is causing the problem I am having?

Answer:

If you are seeing slow responses and timeouts from your nameserver, check its kernel log output ("dmesg" is one way to do this). You might find hundreds of entries similar to this:

Conntrack table full; dropping packet

If so, this article is definitely for you: read on. If not, and you are running BIND named on GNU/Linux, it won't hurt to read on anyway. (Lack of the problem could mean that your nameserver is just not busy enough ... yet.)

Netfilter connection tracking

Linux Netfilter connection tracking is a very powerful resource for firewall engineers and system administrators. But on (or in front of) a nameserver, there is generally no point in tracking UDP DNS queries. Also, Linux kernel defaults for the size of the connection tracking table are unreasonably low for a busy router or nameserver.

"But UDP is connectionless, how can you track connections?" Yes, that is true. For you and I to communicate via UDP, you throw a packet at me, and I throw one back at you. The protocol has no means to establish a "connection" nor to verify for either of us that the other received the sent packet(s).

Netfilter connection tracking, however, is protocol-agnostic. A "connection" is simply an identified source[:port]/destination[:port]/protocol where packets are going (or have gone) in both directions.

Conntrack and DNS in UDP

Protocols which use UDP transport sometimes provide a means in the higher-level protocol to track communication. In the case of DNS, a client (resolver) sends an ID number in each query, so the software can use that (in addition to the source/destination IP addresses and ports) to match queries with the answers received.

A typical UDP "connection" for DNS is exactly two packets: a query comes in, an answer is returned. (From the resolver/client's view it's reversed: a query going out, and an answer coming back.) As we have seen, a busy named server can have lots of these entries in its conntrack table. Each entry requires kernel-space memory, of course, and each entry counts against the total number of entries that the table can accommodate. And each entry remains in the conntrack table until it times out, minutes later, an unreasonably long period for DNS.

An authoritative nameserver is generally going to accept all packets on 53/UDP from anywhere. A recursive resolver is going to accept all packets on 53/UDP from its own networks. Firewall query rate limiting is possible in each case, but ISC does not recommend it.

Therefore, you might as well disable connection tracking for your 53/UDPDNS queries and replies. Fortunately this is very easy to do, and it should be supported on all recent mainstream GNU/Linux distributions. (It might not be possible on custom kernels, if the Netfilter modules are not available. In that case the answer is to fix your kernel.)

The Netfilter raw table and the NOTRACK target was introduced sometime during the heyday of the Linux 2.4 kernels. Later on it was superseded by the CT target with the --notrack option. If your Linux kernel is 2.6 or later, you should have CT and --notrack. (If not, have you considered upgrading? Even 2.6 is getting old now.)

Tables and Chains

Linux Netfilter iptables consists of several independent "tables" which then have predefined "chains". iptables(8) is the userspace binary which manipulates rules in the kernel. Contrary to what you might think, there is no daemon process running; "to start iptables" or "to stop iptables" is an inaccurate way of saying, "to change the kernel's Netfilter rules."

In this article we are mainly concerned with the "raw" table, but we will also touch on the "filter" table. Then we will briefly mention the "nat" table.

The raw table is so named because it sees raw network traffic, before any Netfilter rules are applied to it. The main purpose of raw is to disable connection tracking for selected packets. The raw table provides the following built-in chains: PREROUTING (for packets arriving via any network interface) and OUTPUT (for packets generated by local processes).

The filter table is the place for filtering packets, typically the main purpose of a firewall. The filter table has three built-in chains: INPUT (for packets destined to local sockets), FORWARD (for packets being routed through the box), and OUTPUT (for locally-generated packets).

Not a complete firewall how-to

This article cannot go to go into detail on how to set up your firewall's filtering; we will simply show the few rules you are likely to need for bypassing conntrack for DNS in UDP. Also, it assumes that you need a firewall; perhaps if you are behind an upstream firewall, you can simply disable the one on the nameserver.

Finally, let's talk about the nat table. This is for network address translation (NAT), and if you are doing NAT on your DNS packets, you are not going to be able to use the following sample rules. NAT depends on connection tracking. If this is the case for you, skip down to the bottom, "What do I do if I must have connection tracking?"

Sample Rules

These are in iptables-save(8)/iptables-restore(8) format. This can be converted easily into iptables(8) commands, simply by preceding each rule with iptables and the -t table argument if "table" is other than filter.

Here is the raw table in its entirety, including the comments added by iptables-save(8):

# Generated by iptables-save v1.4.20 on Fri May 16 12:42:55 2014
*raw
:PREROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -p udp -m udp --dport 53 -j CT --notrack
-A PREROUTING -p udp -m udp --sport 53 -j CT --notrack
-A OUTPUT -p udp -m udp --dport 53 -j CT --notrack
-A OUTPUT -p udp -m udp --sport 53 -j CT --notrack
COMMIT
# Completed on Fri May 16 12:42:55 2014

The two rules in each of PREROUTING and OUTPUT match UDP packets with destination port (--dport) and source port (--sport) 53 (respectively) and tell the kernel not to track their connections.

Older kernels might differ

Note: older kernels might not have the CT --notrack target, but the now deprecated NOTRACK target is functionally the same.

Next, the filter table; again, we can't cover the entire filter table here. Many sample rulesets you can find will have a RELATED,ESTABLISHED rule like one of these:

# OLD filter rules; each pair is functionally equivalent

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Typically these rules should be at or near the beginning of each chain's rules. Note that you should have one or the other, not both, for each of INPUT and FORWARD.

We need to switch any -m state --state rule to -m conntrack --ctstate, and to add ,UNTRACKED to the --ctstate list: UNTRACKED is a virtual packet state which is not available in the older and less complete state match extension. Also note that the order of the arguments in the --ctstate list is not significant; ESTABLISHED,UNTRACKED,RELATED will work just as well. Your new rules should look something like this:

# NEW filter rules

-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED,UNTRACKED -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED,UNTRACKED -j ACCEPT

A brief word about the filter table's OUTPUT chain: if you are filtering traffic on OUTPUT, we trust that you know what you are doing. If you don't know what you are doing, please consider not trying to filter OUTPUT. You can break a lot of things which will be very difficult to fix, and you are unlikely to be addressing any real security concern.

For this reason a sample OUTPUT rule is not shown, but you might need it if there are any blocking rules in OUTPUT. The only difference is the name of the chain after the "-A".

See your distribution's documentation!

How and where a GNU/Linux distribution stores firewall rules to restore on reboot or reload varies widely. Some might provide a means to save rules (such as Red Hat's service iptables save command, for example), and others might require you to edit (or to redirect iptables-save(8) output to) a file. This article cannot attempt to document all these varied procedures.

What do I do if I must have connection tracking?

Don't worry. It's not that bad. The default values for the conntrack table are very conservative of memory. Most modern systems which can handle the modern needs of DNS will have plenty of RAM at their disposal. You only have to increase the size of the table.

This is on an aging 3.2.13 system with 4GB of physical RAM:

chuck@chestnut:$ cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max65536

This is on an ISC laptop, kernel 3.12.7 and 16GB RAM:

cba@tp:$ cat /proc/sys/net/nf_conntrack_max65536

And this is from an ancient Slackware 10.0 machine, 2.4.26 kernel and 1GB RAM:

cba@sorry:$ cat /proc/sys/net/ipv4/ip_conntrack_max57344

As you can see, over the years this sysctl(8) setting has changed a few times. It's no problem to increase it; here are samples for each of the above:

chestnut:/etc/sysctl.conf :
# 128 M
net.ipv4.netfilter.ip_conntrack_max = 134217728

tp:/etc/sysctl.conf :
# 512 M
net.nf_conntrack_max = 536870912

sorry:/etc/sysctl.conf :
# 512 K
net.ipv4.ip_conntrack_max = 524288

Then run "sysctl -p" as root to apply these settings.

What about IPv6?

The sample rules shown above are identical, but they would have to be loaded by a different set of commands: ip6tables(8) for individual rule changes and ip6tables-restore(8) to load an entire ruleset at once.

IPv4 and IPv6 firewall rules are separate

IPv4 and IPv6 rules are maintained and manipulated separately in the Linux kernel. Rules which are entered for one IP version do not affect the other.

The sysctl(8) settings in the above section are the same, but replace all instances of "ipv4" with "ipv6".

What about TCP?

To date we are not aware of any Linux-based BIND nameservers which have had this problem associated with TCP DNS queries. Note also that a TCP DNS query involves more than just two packets; there is the overhead of setting up (and later tearing down) the TCP connection. There could also be more than one packet in the response to the query.

Therefore we see no need to disable connection tracking for DNS in TCP. In general, connection tracking is a good thing. It's only DNS in UDP where it can get out of hand, keeping too many old, stale connections in the tracking table.

For further reading

The best reference for Linux Netfilter and iptables are the manuals which are provided with the software, most notably: iptables(8)/ip6tables(8) and iptables-extensions(8). (The latter might not exist in older Netfilter releases, but all the match and target extensions were then documented in the main iptables(8) manual. The extensions are mostly the same for IP versions 4 and 6, so there is no separate manual for IPv6.) See also iptables-save(8)/ip6tables-save(8) and iptables-restore(8)/ip6tables-restore(8). Ideally one should refer to the local copies of these manuals rather than online copies, because there are always slight version differences which can cause confusion.

Online resources

https://en.wikipedia.org/wiki/Netfilter: The Wikipedia page gives a good overview of Netfilter.

https://www.netfilter.org/: The Netfilter project's own site.

https://inai.de/links/iptables/: Some good original content, including the packet diagram, and external links; information about IRC help for Netfilter.

ISC provides professional support for BIND 9, and our support services can include Linux Netfilter assistance. Please see https://www.isc.org/support/ for more information.

Your distribution of GNU/Linux has its own website and user community. Distrowatch is a site which probably has links to them all.