---
title: "Serve-stale implementation details"
slug: "serve-stale-implementation-details"
description: "Details of the logic for applying serve-stale, prefetch, and fetch-limits in BIND 9."
tags: ["quotas", "serve-stale", "pre-fetch", "fetch-limits", "fetch limits", "serve stale", "prefetch"]
updated: 2021-03-19T11:20:43Z
published: 2021-03-19T11:20:43Z
canonical: "kb.isc.org/serve-stale-implementation-details"
stale: true
---

> ## Documentation Index
> Fetch the complete documentation index at: https://kb.isc.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Serve-stale Implementation Details

Here are some details about the BIND 9 serve-stale and prefetch implementations in BIND 9.17.11 and 9.16.13, and a discussion about how these features interact with fetch-limits and other quota mechanisms. This article provides some background on the logic as implemented and is not intended to give explicit guidance on how to set these parameters.

## Serve-stale

- The `max-stale-ttl` configuration is stored in a per-view cache.
- An RRset in any given cache is marked as stale during RRset lookup in that cache, if ALL of the following conditions apply:
  1. The RRset's TTL has reached zero, i.e. the RRset is expired.
  2. `stale-cache-enable` is set to `yes` in the configuration.
  3. The RRset expiry is less than `max-stale-ttl` seconds ago.
- If `stale-refresh-time` is zero (disabled), then:
  1. Lookup of stale RRset in cache only takes place when a previous attempt to refresh the RRset from authoritative servers has failed.
  2. The lookup in cache happens in the same request, right after the failure in attempting to refresh the RRset.
  3. All subsequent requests to the same RRset follow the same path: try to refresh from name servers, fail, try cache.
  4. The default behavior in BIND after the `stale-refresh-time` addition is to have it enabled with a positive value of 30 seconds.
- If `stale-refresh-time` is non-zero (enabled), then a lookup *MAY* return a stale RRset from cache before going into recursion if:
  1. The RRset is marked as stale.
  2. A previous attempt to refresh the RRset has failed.
  3. The lookup happens during the period  `stale-refresh-time` after the refresh failure.

## Negative Cached Content and Serve-stale

Stale negative cached content (NXDOMAIN or NXRRSET) is handled slightly differently because clients prefer positive answers. If there is a stale NXDOMAIN or NXRRSET in cache, BIND returns it only if the resolver query times out (stale negative data will not be returned on `stale-answer-client-timeout`). Although `stale-answer-client-timeout` is not used to provide an earlier response to clients from negative stale cache RRsets, once a refresh attempt of these RRs has eventually timed-out, the `stale-refresh-time` will be started so that subsequent client queries will receive the stale response immediately.

## Fetch-limits

Fetch-limits include the `fetches-per-server` and `fetches-per-zone` quota mechanisms.

The action taken when a query exceeds any of the fetch-limits is not to process the query (that is, **not** to initiate any new 'fetch' to obtain an answer to send to the client).

The response to the client when such a query is dropped varies depending on the fetch-limit triggered, as follows: - `fetches-per-server`: the default action is to return a SERVFAIL to the client. - `fetches-per-zone`: no responses are sent to the client; the client observes this as a timeout.

*It is possible to change the client response behavior for both `fetches-per-zone` and `fetches-per-server` options in named.conf.*

## Prefetch

Prefetching takes place in the late stage of processing a client query, in the response-building phase; more specifically, it occurs during execution of the following functions:

- `query_respond_any` - Build the response for a query for type ANY.
- `query_addanswer`  - Fill the ANSWER section of a positive response.
- `query_cname`  - Handle CNAME responses.
- `query_dname`  - Handle DNAME responses.

Prefetching code performs some quota verification, in the following order:

1. Check if the `recursive-clients` quota is below the soft clients value. If yes, prefetch attaches to the `recursive-clients` quota.
2. If there is a fetch context already created for <qname,qtype,qclass> (let's call it `curr_fctx`), then:  

- Let `fctx_num_clients` = number of clients currently associated with that fetch context.  

- If current client address matches one of the addresses currently associated with `curr_fctx`, drop prefetch and log the query as duplicated  

- Else, if current client address doesn't match any of the addresses currently associated with `curr_fctx`, then check if `fctx_num_clients` is less than the current auto-tuned value for 'clients-per-query'; if the check fails, drop the prefetch.  

- If none of the checks above abort prefetching, attach to `curr_fctx` and proceed.
3. If the current number of fetches for the target domain is greater than or equal to the value of `fetches-per-zone`, then drop the fetch.
4. If the number of current queries exceeds `max-recursion-queries`, then drop the fetch.
5. Finally, prefetch tries to find a server address on which to send the query, one that isn't over quota, i.e. a server in which the number of current fetches targeted does not exceed the configured `fetches-per-server` limit.

## The impact of fetch-limits

How does prefetch interact with fetch-limits? Prefetch is dropped if either the `fetches-per-server` or the `fetches-per-zone` quota is reached. It is also dropped if any of the following quotas are reached: - `recursive-clients` - `clients-per-query` (actually, the value used is a self-adjusted one between `clients-per-query` and `max-clients-per-query`). - `max-recursion-queries`

How does serve-stale interact with fetch-limits when serving of stale answers has been enabled? - If there is eligible stale content with an active `stale-refresh-time` window, then no fetch is initiated and the stale answer will be served to the client. - When a fetch is dropped due to fetch-limits, then before sending SERVFAIL or DROP (depending on what's configured in fetch-limits), we'll look to see if there is stale data we could respond to the client query with instead. - Because there is no fetch initiated when a query triggers fetch-limits, although we can respond to the client using eligible (within `max-stale-ttl`) stale data, we will not start a new `stale-refresh-time` for the stale data we use. A `stale-refresh-time` window is only opened when a refresh attempt has timed out.

Q. What is the logic path if content has expired and a client query comes in that would normally trigger a fetch (which ought to fail and lead to the content being marked for serving stale), but that fetch never happens because it is dropped because of fetch-limits? A. If the content (RRset) has expired and a query comes in asking for it, then, assuming the RRset is not yet marked as stale, and `stale-cache-enable` is yes, the following steps take place:

1. A cache lookup is made, but expired entries are ignored (they are now marked as stale).
2. A fetch is initiated which is dropped due to fetch-limits.
3. A new cache lookup is made, now including stale entries. A response is sent to the client with the stale answer (if available).
4. If there is no stale entry, a response is sent to the client (or not), depending on which fetch-limit was triggered (see the behavior described in the beginning of this document).

Q. For a query dropped in this situation, does BIND initiate a `stale-refresh-time`window for this RRset? A. A query dropped due to fetch-limits won't activate `stale-refresh-time`, as this is not considered a real failure in contacting the name servers in an attempt to refresh the given RRset. - Although `stale-answer-client-timeout` will not be initiated when content cannot be refreshed due to fetch-limits, if there is eligible stale data, clients will still receive a prompt response using those stale cached RRsets.

          Update

          

This KB has been updated to reflect the behavior in BIND 9.17.11; 9.16.13 and 9.16.13-S1. For more details of the change see [the Gitlab issue](https://gitlab.isc.org/isc-projects/bind9/-/issues/2434).
