ArsDigita Reporte
by
Terence Collins and
Philip Greenspun
Suppose that you have a bunch of Web services all running on one Unix
box. ArsDigita Reporte is a separate reports server that, every night
at 2:00 am, will
- grab the logs from your production servers
- run analog
to produce a daily report for the day before. As a side effect of
running analog, ArsDigita Reporte
- accumulates a cache for weekly and monthly reports
- does reverse DNS lookups and accumulates a cache for those (so that
you can see how many users came from France or .edu domains)
- if appropriate, run analog over the cache files to produce a weekly
or monthly report
The idea of this is that you give each customer a username/password pair
for your reports server and then the reports server demands
authentication. One a user authenticates himself, he is redirected to
the appropriate section of the server and sees only his service's
statistics.
ArsDigita Reporte is not in any way innovative as far as log analysis
goes. We provide no more and no less capability than analog (which is
one of the best tools and is free and you have the source code so you
can change it, which is what we did to cope with some weirdness in how
analog treats some of our .tcl page loads). What ArsDigita Reporte
saves you from having to do is write a bunch of Unix cron jobs and set
up servers to deliver the reports to your clients.
ArsDigita Reporte simultaneously accomplishes Year 2000-compliance and
scalability by keeping each year's reports in a separate directory named
"1998" or "1999" or whatever. This keeps any particular Unix directory
from filling up with thousands of files and making poor old Unix dig
around too much for the file you need.
ArsDigita Reporte should work with any Web server program (i.e., you can
be using Apache or Netscape Enterprise or whatever as your user-visible
Web server). To run the reporting server, you will need to
Some background for ArsDigita Reporte may be obtained by reading
Philip and Alex's Guide to Web Publishing.
This is free software, copyright ArsDigita and distributed under the GNU General Public
License.
Known Problems
January 4, 1999: A new version is implemented with 0-padded month
fields. If you have downloaded the old distribution, you can make
these fixes yourself:
- in tcl/daily-report-procs.tcl:
add the lines
set todays_month [format %02d $todays_month]
set month [format %02d $month]
before
set report_date "${two_digit_year}${month}$monthday"
- in tcl/scheduled-procs.tcl:
add the line
set month [format %02d $month]
before
set report_date "${two_digit_year}${month}$monthday"
- in tcl/defs.tcl:
under the procs six_digit_time_string_from_ns_time and eight_digit_time_string_from_ns_time, change these lines from:
set month [expr 1 + [parsetime_from_seconds mon $time]]
to:
set month [format %02d [expr 1 + [parsetime_from_seconds mon $time]]]
In the proc month_length_in_days, add the line
set numeric_month [string trimleft $numeric_month 0]
before the line
switch $numeric_month {
- in service-index.tcl:
change this lines from:
set one_month_ago_time [expr $todays_time - ([month_length_in_days [expr [string range [six_digit_time_string_from_ns_time $todays_time] 2 3] - 1]] * $one_day_in_seconds)]
to these:
set last_month_numeric [expr ([string range [six_digit_time_string_from_ns_time $todays_time] 2 3] - 1)]
if {$last_month_numeric==0} {set last_month_numeric 12}
set one_month_ago_time [expr $todays_time - ([month_length_in_days $last_month_numeric ] * $one_day_in_seconds)]
There aren't any known problems with ArsDigita Reporte per se.
However, analog itself has been known to dump core. Sometimes just
pulling the command line from the AOLserver error log (where it is
written regardless of whether analysis produces any errors) and
rerunning it from a shell will get you out of your difficulty. Anyway,
Reporte will send you email if it has any problem overnight. And
Reporte will try again during the day if your machine was down during
the night. So it should be reasonably robust.
My (philg's) main complaint with the whole system is that reverse
DNS lookup makes log analysis crawl. Analyzing 24 hours of photo.net logs (500,000 hits) takes between 3 and 5
hours! This is being done in the early morning hours on a 4-CPU HP RISC
box with 4 GB of RAM so presumably all the bottleneck is reverse DNS.
tcollins@arsdigita.com
and philg@mit.edu