vServer memory and number of process checks

Recently we have seen some issues with apache2 and php memory usage in Debian Lenny (stable) vservers growing higher than expected. Restarting apache2 (something that is probably a good idea once a day anyway just to clear the apc cache) was one option. But we wanted a mechanism that would allow us to monitor the usage of resources inside the vserver context and allow a nagios alert to be generated if usage was greater than say, 85% of RSS (resident memory usage) or if the number of processes was above a certain limit. Normally a server or vserver will have a fairly static average number of processes unless the system has a batch oriented workload. Tracking the number of processes is useful as in certain resource starvation contexts, such as high cpu load and low system (host) memory, cron jobs that run in the background can fork off copies of programs that pile up. This makes the cpu and host memory issue worsen. For this reason we limit (via a vserver hard limit) the RSS and NPROC attributes in a vserver.

Getting nagios to monitor the RSS and NPROC usage on a per-vserver basis from the host is a fairly simple job for perl. We open a pipe to the vserver-stat program to gather the currently active vservers and the current RSS and process usage and we then run the vlimit program for each vserver to gather the current limits, and its a matter of some basic math.