How does clients-per-query work?

Question:

How does clients-per-query work when named is running?

There is often confusion over clients-per-query, especially when encountering logfile entries such as these below:

10-May-2011 13:01:02.745 resolver: notice: clients-per-query increased to 20
10-May-2011 13:21:02.747 resolver: notice: clients-per-query decreased to 19
10-May-2011 14:47:03.775 resolver: notice: clients-per-query increased to 15
10-May-2011 15:01:01.679 resolver: notice: clients-per-query increased to 15

Answer:

A client in this scenario is a unique query that has been received by named and which should receive its own query response. (Duplicate queries are identified and discarded.)

When there are multiple client queries received for the same 'name' (in fact same name and type) that cause the server to perform queries on the behalf of the requester, then named optimizes how it operates. Behind the scenes, named is doing the necessary work just
once, but the multiple requests (clients) are linked to that one piece of work.

named limits the number of clients that can simultaneously be querying for a particular name/type. The initial limit is clients-per-query, which by default starts at 10. If 10 clients are already waiting for an answer for example.com/A, then the 11th client to ask for it is dropped. When an answer finally arrives for example.com/A, the limit is raised by 5.

Later, if there's another name/type that builds up to 15 clients waiting for an answer, and a 16th comes in and gets dropped, then when an answer arrives for that name, the limit will be raised to 20, and so on.

This continues until the limit has reached max-clients-per-query, which defaults to 100.

After a while if named is getting along successfully with 100, it will try lowering the limit back down to 99, then 98, and so on until it reaches the original 10.

Thus the significant configuration option here for shedding load when servers for zones are unreachable is max-clients-per-query,

You need to have max-clients-per-query large enough to handle the situation under normal processing, where you have a lot of duplicate queries for the same name, so that a reasonable number of them can wait until that name is resolved and placed in cache.

The other side of this is that if your server is stormed by duplicate queries from different clients, you want named to be dropping most of these queries - so you don't want max-clients-per-query to be too large either.

clients-per-query is not incremented unless the queries for which the clients were waiting (with some of them dropped), result in an answer being added to cache

This is subtle. Under normal operation, clients-per-query is automatically tuned up and down (but never exceeding max-clients-per-query) so that it can accommodate normal spikes in client queries for popular names. However in the situation where the nameservers for a specific domain fail to respond or become unreachable, since the queued clients will not be successfully answered, this is an abnormal situation against which clients-per-query is designed to offer protection. Therefore, even if some clients are dropped, when this happens, the value of clients-per-query will not be raised.

Most clients who don't get an answer will retry the query, so having max-clients-per-query too small for an unexpected but genuine peak in 'same' client queries shouldn't cause more than a short delay in query resolution - noting particularly that once the in-progress query is resolved by the server (and assuming that the authoritative servers aren't being silly and providing an answer with TTL 0), it will be added to cache so that query retries can be answered immediately. In the situation where no answer could be obtained and the eventual response to the waiting clients was SERVFAIL, this too will be added to cache, with a cache lifetime controlled by configuration option servfail-ttl (default 1s).

Recursive Client Rate Limiting supplements clients-per-query but both should be in-use.

Recursive Client Rate Limiting is applied to unique fetches. clients-per-query limiting handles the situation where many clients make identical queries (they will generate between them a single fetch, but all of these clients are added to the list of recursive clients). Recursive Client Rate Limiting will not protect your server resources in this situation.

Cache prefetch should prevent most multiple-client fetches

When a client query results in a cache miss, a fetch (or series of fetches) will be launched and the client is added to the recursive clients list to await the conclusion of this process. A popular name is likely to be queried by other clients while the cache is being repopulated. However with prefetch enabled, a client query received for that name just before it expires from cache initiates an early cache refresh process with the objective of updating the RRsets ahead of their demise. For more information, see Early refresh of cache records (cache prefetch) in BIND