Testing authoritative server support for EDNS and large UDP buffer sizes in BIND 9.10
The EDNS fallback code was re-worked in BIND 9.10 to make it more resilient and reliable when:
- Encountering new authoritative servers that have not been queried before and for whom the capabilities/support for EDNS and large buffer sizes by both the server and the network path between it and the resolver is unknown.
- Facing intermittent network packet losses which, on older versions of BIND, can result in SERVFAILs due to servers that should support EDNS being marked as EDNS-incapable.
The EDNS code in BIND 9.10 records successful plain and EDNS query counts as well at timeouts for plain DNS and EDNS queries at various EDNS buffer sizes: 4096, 1432, 1232 and 512 for each server named talks to. A EDNS timeout for a lower buffer size is also counted against higher buffer sizes. These are held in 8 bit counters and are shifted on overflow of any counter. This will result in the removal from history of any false positives due to transitory network problems.
The buffer sizes of 1432 and 1232 are chosen to allow for a IPv4/IPv6 encapsulated UDP message to be sent without fragmentation at Ethernet and IPv6 network minimum MTU sizes. Named also records the largest successful EDNS response size seen.
When querying a new server for the first time, named will send a EDNS query advertising a 512 byte UDP buffer. This is the most conservative EDNS message that can be sent. If successful, subsequent queries will 'probe' the capabilities of the authoritative server by advertising successively larger EDNS sizes.
When querying a known server using EDNS, named will choose a EDNS buffer size based on the history of EDNS timeouts at various advertised sizes and also on the largest successful EDNS response that it has already received from that server. Note however that named cannot learn that a server (and the path between itself and the remote server) can support larger UDP packet sizes until it successfully receives a large response from that server.
If any response results in a response with TC=1 being returned, then named will re-send the query using TCP - it will not take this opportunity to try with a larger advertised EDNS UDP packet size (because this might further delay getting a query response to the client).
If there are too many timeouts to EDNS queries and with successful plain DNS query responses recorded in the query counts, then named will fallback to using plain DNS when taking to a server, although it will still periodically send a EDNS query to see if the server now supports EDNS.
The Address database dump section of a cache dump now displays the counters being used to track success/failure history - for example:
; 192.0.2.30 [srtt 19320] [flags 00004000] [edns 3/0/0/0/0] [plain 0/0] [udpsize 1118] [ttl 1350]
The edns and plain counters are incremented as you would expect them to be as the responses are
successfully received or timeouts are logged. But as noted above, all of them are adjusted when any one of them overflows (a bitwise shift) in order to eliminate over time any false positives and counts of transient errors, and to force a periodic retry with EDNS (if the server has not yet responded to queries with EDNS) and again to probe larger EDNS packet sizes.
Their interpretation is provided at the start of the Address database dump section:
; [edns success/4096 timeout/1432 timeout/1232 timeout/512 timeout] ; [plain success/timeout]
ADB server entries persist for up to 10 minutes. They are displayed by name(s) as well as address unless address entries (A and AAAA) have expired from cache, in which case they appear only in the Unassociated entries table in the cache dump. (But they may be re-associated with their name(s) later, if the cache information is refreshed.)
If you have been testing servers using dig or monitoring DNS queries and responses with packet tracing, you will have observed that servers also advertise an EDNS buffer size when they respond to clients. Intuitively you might expect them to be advertising the maximum payload they can respond with to a client. This would be incorrect! What they are advertising is the maximum size they can receive as a client. There is a good reason for this - and that is to provide a mechanism by which a client about to send a dynamic UPDATE to a server can test to find out the acceptable size limit beforehand by using a trivial QUERY - otherwise there is no other mechanism by which this can be determined. This is covered in section 4.5.4 of RFC 2671