-
Print
-
DarkLight
-
PDF
Serve-stale Implementation Details
Here are some details about the BIND 9 serve-stale and prefetch implementations in BIND 9.17.11 and 9.16.13, and a discussion about how these features interact with fetch-limits and other quota mechanisms. This article provides some background on the logic as implemented and is not intended to give explicit guidance on how to set these parameters.
Serve-stale
-
The
max-stale-ttl
configuration is stored in a per-view cache. -
An RRset in any given cache is marked as stale during RRset lookup in that cache, if ALL of the following conditions apply:
- The RRset's TTL has reached zero, i.e. the RRset is expired.
stale-cache-enable
is set toyes
in the configuration.- The RRset expiry is less than
max-stale-ttl
seconds ago.
-
If
stale-refresh-time
is zero (disabled), then:- Lookup of stale RRset in cache only takes place when a previous attempt to refresh the RRset from authoritative servers has failed.
- The lookup in cache happens in the same request, right after the failure in attempting to refresh the RRset.
- All subsequent requests to the same RRset follow the same path: try to refresh from name servers, fail, try cache.
- The default behavior in BIND after the
stale-refresh-time
addition is to have it enabled with a positive value of 30 seconds.
-
If
stale-refresh-time
is non-zero (enabled), then a lookup MAY return a stale RRset from cache before going into recursion if:- The RRset is marked as stale.
- A previous attempt to refresh the RRset has failed.
- The lookup happens during the period
stale-refresh-time
after the refresh failure.
Negative Cached Content and Serve-stale
Stale negative cached content (NXDOMAIN or NXRRSET) is handled slightly differently because clients prefer positive answers. If there is a stale NXDOMAIN or NXRRSET in cache, BIND returns it only if the resolver query times out (stale negative data will not be returned on stale-answer-client-timeout
). Although stale-answer-client-timeout
is not used to provide an earlier response to clients from negative stale cache RRsets, once a refresh attempt of these RRs has eventually timed-out, the stale-refresh-time
will be started so that subsequent client queries will receive the stale response immediately.
Fetch-limits
Fetch-limits include the fetches-per-server
and fetches-per-zone
quota mechanisms.
The action taken when a query exceeds any of the fetch-limits is not to process the query (that is, not to initiate any new 'fetch' to obtain an answer to send to the client).
The response to the client when such a query is dropped varies depending on the fetch-limit triggered, as follows:
- fetches-per-server
: the default action is to return a SERVFAIL to the client.
- fetches-per-zone
: no responses are sent to the client; the client observes this as a timeout.
It is possible to change the client response behavior for both fetches-per-zone
and fetches-per-server
options in named.conf.
Prefetch
Prefetching takes place in the late stage of processing a client query, in the response-building phase; more specifically, it occurs during execution of the following functions:
query_respond_any
- Build the response for a query for type ANY.query_addanswer
- Fill the ANSWER section of a positive response.query_cname
- Handle CNAME responses.query_dname
- Handle DNAME responses.
Prefetching code performs some quota verification, in the following order:
- Check if the
recursive-clients
quota is below the soft clients value. If yes, prefetch attaches to therecursive-clients
quota. - If there is a fetch context already created for <qname,qtype,qclass> (let's call it
curr_fctx
), then:
- Letfctx_num_clients
= number of clients currently associated with that fetch context.
- If current client address matches one of the addresses currently associated withcurr_fctx
, drop prefetch and log the query as duplicated
- Else, if current client address doesn't match any of the addresses currently associated withcurr_fctx
, then check iffctx_num_clients
is less than the current auto-tuned value for 'clients-per-query'; if the check fails, drop the prefetch.
- If none of the checks above abort prefetching, attach tocurr_fctx
and proceed. - If the current number of fetches for the target domain is greater than or equal to the value of
fetches-per-zone
, then drop the fetch. - If the number of current queries exceeds
max-recursion-queries
, then drop the fetch. - Finally, prefetch tries to find a server address on which to send the query, one that isn't over quota, i.e. a server in which the number of current fetches targeted does not exceed the configured
fetches-per-server
limit.
The impact of fetch-limits
How does prefetch interact with fetch-limits?
Prefetch is dropped if either the fetches-per-server
or the fetches-per-zone
quota is reached.
It is also dropped if any of the following quotas are reached:
- recursive-clients
- clients-per-query
(actually, the value used is a self-adjusted one between clients-per-query
and max-clients-per-query
).
- max-recursion-queries
How does serve-stale interact with fetch-limits when serving of stale answers has been enabled?
- If there is eligible stale content with an active stale-refresh-time
window, then no fetch is initiated and the stale answer will be served to the client.
- When a fetch is dropped due to fetch-limits, then before sending SERVFAIL or DROP (depending on what's configured in fetch-limits), we'll look to see if there is stale data we could respond to the client query with instead.
- Because there is no fetch initiated when a query triggers fetch-limits, although we can respond to the client using eligible (within max-stale-ttl
) stale data, we will not start a new stale-refresh-time
for the stale data we use. A stale-refresh-time
window is only opened when a refresh attempt has timed out.
Q. What is the logic path if content has expired and a client query comes in that would normally trigger a fetch (which ought to fail and lead to the content being marked for serving stale), but that fetch never happens because it is dropped because of fetch-limits?
A. If the content (RRset) has expired and a query comes in asking for it, then, assuming the RRset is not yet marked as stale, and stale-cache-enable
is yes, the following steps take place:
- A cache lookup is made, but expired entries are ignored (they are now marked as stale).
- A fetch is initiated which is dropped due to fetch-limits.
- A new cache lookup is made, now including stale entries. A response is sent to the client with the stale answer (if available).
- If there is no stale entry, a response is sent to the client (or not), depending on which fetch-limit was triggered (see the behavior described in the beginning of this document).
Q. For a query dropped in this situation, does BIND initiate a stale-refresh-time
window for this RRset?
A. A query dropped due to fetch-limits won't activate stale-refresh-time
, as this is not considered a real failure in contacting the name servers in an attempt to refresh the given RRset.
- Although stale-answer-client-timeout
will not be initiated when content cannot be refreshed due to fetch-limits, if there is eligible stale data, clients will still receive a prompt response using those stale cached RRsets.
This KB has been updated to reflect the behavior in BIND 9.17.11; 9.16.13 and 9.16.13-S1. For more details of the change see the Gitlab issue.