Monitoring
ACS Documentation :
ACS Administrator's Guide :
Monitoring
- User directory: none
- Admin directory: /admin/monitoring/
- Procedures: /tcl/watchdog-defs, /tcl/cassandracle-defs
- Binaries: /bin/aolserver-errors.pl
The Big Picture
The ArsDigita Community System has an integrated set of monitoring
tools.
Parameters
Monitoring parameters as centralized in the monitoring
section of the .ini file. Add a new
PersontoNotify
for
each person who should receive monitoring alerts.
[ns/server/yourservername/acs/monitoring]
; People to email for alerts
PersontoNotify=nerd1@yourservicename.com
;PersontoNotify=nerd2@yourservicename.com
; location of the watchdog perl script
WatchDogParser=/web/yourservicename/bin/aolserver-errors.pl
; watchdog frequency in minutes
WatchDogFrequency=15
Current page requests - monitor
The "current page request" section (linked from /admin/monitoring/)
will produce a report like the following.
There are a total of 8 requests being served right now (to 8 distinct IP addresses). Note that this number seems to include only
the larger requests. Smaller requests, e.g., for files and in-line images, seem to come and go too fast for this program to
catch. |
conn # | client IP | state | method | url | n seconds | bytes |
17899 | 212.252.145.38 | running | GET | /photo/pcd3255/chappy-store-31.4.jpg | 59 | 158544
|
18185 | 38.27.213.213 | running | GET | /wtr/thebook/html | 21 | 0
|
18247 | 171.210.228.91 | running | GET | /photo/nikon/nikon-reviews | 15 | 0
|
18367 | 209.86.54.190 | running | GET | /bboard/image | 8 | 34228
|
18454 | 199.174.160.135 | running | GET | /photo/pcd1669/treptower-big-view-51.4.jpg | 1 | 34376
|
18464 | 207.100.29.220 | running | ? | ? | 1 | 0
|
18468 | 216.214.210.53 | running | GET | /chat/js-refresh | 0 | 0
|
18481 | 216.34.106.252 | running | GET | /monitor | 0 | 0
|
This report will inform you which users are waiting on pages from your server.
In the report above, users asking for large images or pages are waiting. This
is normal because some users have very slow connections.
If you see the same or .adp file often, especially with the longest wait times, it is likely that the script is extremely slow or is hogging database handles. You should
- Examine and fix the page
- User ad_return_if_another_copy_is_running to limit the number of times the page can concurrently run (limit to a few less than your total db pool).
This will prevent multiple executions of that page from destroying your whole web service.
If you see a large number of requests from the same IP address, it is
likely that a poorly-designed spider is attacking your web service. To stop it,
ban that IP address from your system.
Cassandracle (Oracle)
Cassandracle is a Web-based monitor for an Oracle installation.
The goal is that, at a glance, a novice Oracle DBA ought to be
able to identify problems and find pointers to relevant reference materials.
To use Cassandracle in your installation, you will need to
give the web service's database
user read access to some core Oracle tables.
- Log into Oracle via sqlplus
- Execute:
SQL> connect internal
- Run the commands in /sql/cassandracle.sql
- Execute
SQL> grant ad_cassandracle to username;
Configuration
This is a simple section with information about the current machine
and connection. The information provided is pretty sparse and should
expand in the future.
WatchDog (Error log)
Every WatchDogFrequency
seconds, the service's error logs will
be scanned. If errors are found, they will be emailed to those configured
as a PersontoNotify
. The administration pages have a tool
to search the error log for errors.
If WatchDogFrequency
is 0, the error logs won't be scanned
regularly.
Registered Filters and Schedule Procedures
The procs ad_register_filter and ad_schedule_proc
are wrappers around the corresponding ns_ calls, which allow
us to more carefully track what's happening on the server and when.
/admin/monitoring/filters shows
which filters are called for which URLs and methods, and /admin/monitoring/scheduled-procs
shows which procedures are scheduled to be called in the future.
Monitoring top
output
Every TopFrequency
seconds, the system call residing
at TopLocation
will be run, and its output parsed into
overall and procedure-specific statistics. /admin/monitoring/top
shows the historical results of this periodic call, and lets you
see the current output of top on the machine your service is running
on, over the web.
If TopFrequency
is 0, top won't be run regularly.
See also ad-monitoring-defs.tcl and
monitoring.sql.
- Caveat 1: top output varies a great deal from one implementation to
another, and this monitor currently only recognizes the syntax of the
default Solaris 7 top function.
- Caveat 2: Neither the regular monitoring nor running top from a
web page are possible if you are running the ACS in a chrooted
environment, since top looks in /proc, a sensitive directory.
teadams@arsdigita.com
jsalz@mit.edu