Using RAMdisks and other similar volatile storage for the leases file
It can be tempting to consider using high-speed but volatile storage for the dhcpd leases files in an attempt to improve performance where there are i/o bottlenecks. Often this is accompanied by a plan to routinely copy the leases file to hard disk, for example via
crond. The intent is to have a "reasonably up-to-date database" to recover from. This would be great if it worked that way - but it doesn't.
Both DHCP and the failover protocol operate using the fundamental presumption that lease information has been synched to disk before making replies over the protocol wire. Therefore, what is in the leases file is regarded as accurate and representative. So what can go wrong if DHCP is restarted with an inaccurate (older) version of the leases file?
- Using an old lease state structure for an active lease could cause the server to expire it early upon restarting, and subsequently offer it to a different client, causing an IP address conflict and taking both clients off the network.
- In particular with failover, the protocol channel does not resynch the lease database; it works on the assumption that the peer's recorded lease state hasn't changed unless there is a protocol-level message adjusting state, so a "partially recovered" lease database actually creates lease database inconsistency between the peers.
If you are not in a failover situation, then you are at least better off having a partially recovered lease database than by starting with a completely empty database; it reduces the chances of addressing conflicts.
However, if you have a server failure and lose the most recent version of the leases file in a failover situation, then it's actually preferable to "fault" the lease database and rely on the partner to have maintained a complete database. The only downside this has on restart is having to wait through MCLT delays associated with that operation.
We would recommend the use of RAMdisk or other non-recoverable media for the leases database as a temporary measure only, e.g. for diagnostic purposes only to confirm the source of a performance problem or bottleneck, or for an interim solution while waiting on the installation of battery-backed RAID or other sync-rate performance storage media.
Even then, for an interim solution, it might be preferable instead to raise the lease-time if that is administratively permitted (and won't needlessly starve the lease pools), as this will directly lower the load placed on the servers and may bring them below the the storage performance limit.