ArsDigita Keepalive
for
AOLserver
by
Ben Adida
and
Philip Greenspun,
part of
ArsDigita Free Tools
ArsDigita Keepalive is a system that monitors your web services at
regular, short intervals, and takes action to resolve problems found.
If Keepalive fails to reach a page, depending on how many consecutive
previous failures it has seen and the configuration parameters, it will
take one of the following actions:
- nothing except decrement a counter
- send email to a previously defined group of addresses
- execute a shell command, presumably one that restarts the stuck
service
Keepalive is built using
AOLserver (free) and takes advantage
of AOLserver's built-in scheduler (like Unix cron but lighter weight)
and Tcl API (includes a call to HTTP GET a page from another server).
However, unlike most of our AOLserver products, you don't need to
install an RDBMS in order to use Keepalive. Web servers generally get
stuck because of problems with the RDBMS, so a monitor that depended on
an RDBMS would be self-defeating.
Although we generally use Keepalive to monitor AOLserver-based Web
services, it will work fine to monitor any HTTP service on a Unix
machine.
Installation
- Download keepalive-1999.tar.gz (last updated
December 14, 1998)
-
cd /web
-
tar xvf keepalive-1999.tar.gz
(creates /web/keepalive)
- Create an AOLserver whose page root is /web/keepalive and whose
private Tcl directory is /web/keepalive/tcl
- Edit /tcl/defs.tcl to set the main Keepalive parameters.
- Edit the
keepalive_init
procedure in /tcl/init.tcl to
add monitors analogously in the same way as the sample monitor is added, knowing
that the arguments to new_monitor are, in order:
- name
- URL of test page
- expected return
- shell command to execute if failure
- Tcl list of admin email addresses to notify
- (optional) number of retries before failure action is executed. This defaults to 5.
- (optional) threshold of retries below which email is sent. This
defaults to the number of retries, meaning that Keepalive will send mail
if there is any problem (if you feel that you're getting spammed
about problems that work themselves out, set this to some lower number;
we find that 4 and 2 are good numbers)
- You're done! Start your server
Which Shell Command?
You might well ask yourself which shell command will restart a Web
server. It depends. In the case of AOLserver, we run the server by
inserting a line in /etc/inittab:
nsjw:34:respawn:/home/nsadmin/bin/nsd -i -c /home/nsadmin/nsd.ini
which tells Unix to restart nsd if it should die for any reason. Thus
keepalive just needs to kill the existing nsd process. The problem is
that Web servers must be owned by root if they are to grab Port 80 and
Keepalive can't kill a Web server unless it runs as root (a security
risk). The solution at ArsDigita is to build a setuid Perl script that
Keepalive can call:
restart-aolserver
#!/usr/local/bin/perl
## Restarts an AOLserver. Takes as its only argument the name of the server to kill.
## This is a perl script because it needs to run setuid root,
## and perl has fewer security gotchas than most shells.
$ENV{'PATH'} = '/sbin:/bin';
# uncomment this stuff if you're at an installation where a server
# takes a long time to restart or keeps important state
# if (scalar(@ARGV) == 0) {
# die "Don't run this without any arguments!";
# }
$server = shift;
$< = $>; # set realuid to effective uid (root)
sub getpids {
## get the PIDs of all jobdirect servers
my $ps_output = `/usr/bin/ps -ef`;
my @pids;
foreach (split(/\n/, $ps_output)) {
next unless /^\s*\S+\s+(\d+).*nsd.*$server.ini/;
push(@pids, $1);
}
@pids;
}
@pids = &getpids;
print "Killing ", join(" ", @pids), "\n";
kill 'KILL', @pids;
License
This is open-source software, copyright 1998 ArsDigita, LLC and licensed
under the
GNU General
Public License.
Support and Customization
If you want a extended version of Keepalive or support, you can hire
the programmer of your choice to install, maintain, and customize keepalive.
ArsDigita offers support as well, but
probably not at a price that you'd be happy to pay.
ben@arsdigita.com
Reader's Comments
I think using aolserver to keep another aolserver alive is a bit risky. Even without the RDBMS - if both your aolservers hang for the same reason then what? I would feel much more comfortable using cron and a shell script.
-- David Cotter, September 21, 2001