ACS Request Processor
by
Jon Salz
ACS Documentation :
ACS Core Architecture Guide :
ACS Request Processor
- Tcl procedures: /packages/acs-core/request-processor-procs.tcl
This document describes the request processor, a series of Tcl procedures
which handles every single HTTP request made to an AOLserver running ACS.
The Big Picture
In the early days of the Web, before the dawn of database-driven, dynamic
content, web servers maintained a very straightforward mapping between URLs
and files. In response to a request for a particular URL, servers just
prepended a document root path like
/web/arsdigita/www
to the URL,
serving up the file named by that path.
This is no longer the case: the process of responding to a request involves
many more steps than simply resolving a path and delivering a file. Serving an
ACS page involves (at the very least) reading security information from HTTP cookies, extracting
subcommunity information from the URL, calling filters and
registered procedures, invoking the abstract URL system to determine
which file to serve, and then actually serving the file.
The traditional way to realize this process was to register a plethora of filters and
procedures, but there were several problems with this approach:
- It was difficult to deliver files which didn't physically live
underneath the page root (in the
/www
directory), as in
the case of pages associated with packages.
- It was difficult to control the order in which filters were
executed.
- It was difficult to determine which code was executed for requests
to which URLs.
- If something broke, it was difficult to determine which filter was
causing the problem.
- Scoped requests needed to be handled specially, with
ns_register_proc
s for each possible URL prefix
(/groups
, /some-group-type
,
/some-group-name
, etc.).
- Filters and registered procedures were strictly URL-based, so they
broke under scoping, e.g. a procedure registered for
/download/files/
wouldn't work for requests under
/groups/some-group-name/download/files/62/smug.jpg
.
To solve this problem, in ACS 3.3 we introduced a unified
request processor implementing the series of actions above. It is written
purely in Tcl as a single procedure (not a mess of
ns_register_filter
s and
ns_register_proc
s), allowing us a great deal of control over exactly what
happens when we deliver a response to an HTTP request. We also introduced new APIs,
ad_register_filter
and
ad_register_proc
, analogous to existing
AOLserver APIs (
ns_register_filter
and
ns_register_proc
) but tightly
integrated into the request processor.
Steps in the Pipeline
The request processor is registered with AOLserver as a
preauth
filter.
In fact, it is the
only filter ever registered with AOLserver. We've killed off
ns_register_filter
and
ns_register_proc
- see
the
API below. It contains the following steps:
-
Global initialization. Initialize the
ad_conn
global variable,
which contains information about the connection (see
ad_conn
below).
-
Library reloading. If the package manager
has been instructed to reload any
*-procs.tcl
files,
source them.
Also examine any files registered to be watched (via the package manager);
if any have been changed, source them as well.
-
Developer support. Call the hook to the
developer support subsystem,
if it exists, to save information about the active connection.
-
Host header checking. Check the HTTP
Host
header to make
sure it's what we expect it to be. If the Host
header is present
but differs from the canonical server name (as reported by ns_info
location
), issue an HTTP redirect using the the correct, canonical
server name.
For instance, if someone accesses the URL
http://arsdigita.com/pages/customers
, we redirect them to
http://www.arsdigita.com/pages/customers
since the canonical
host name for the server is www.arsdigita.com
, not
arsdigita.com
.
-
Security handling. Examine the security cookies,
ad_browser_id
and ad_session_id
. If either is is invalid or not present at all,
issue a cookie and note information about the new browser or session in the
database.
-
Examine the URL for subcommunity information. If the URL belongs to
a subcommunity (e.g.
/groups/Boston/address-book/
belongs to
the Boston subcommunity), strip the subcommunity information from the URL
and save it in the environment to be later accessed by ad_conn
.
This is not implemented in ACS 3.3.
-
Invoke applicable filters registered with
ad_register_filter
. Use
the URL with subcommunity stripped as the string to be matched against
patterns passed to ad_register_filter
, e.g. if a
filter is registered on /download/files/
, it will be applied
on URLs like /groups/Boston/download/files/*
since we stripped
/groups/Boston
from the URL in the step above.
-
If an applicable procedure has been registered with
ad_register_proc
,
invoke it. As in the previous step, match using the URL minus
subcommunity information. If such a procedure is found, the process
terminates here.
-
Resolve the URL to a file in the filesystem, again ignoring subcommunity
information.
First resolve the path:
- If a prefix of the URL has been registered with
rp_register_directory_map
, map to the associated directory
in the filesystem. For example, if we've called
rp_register_directory_map "apm" "acs-core" "apm-docs"
then all requests under the /apm
URL stub are mapped to
the acs-core
package directory apm-docs/www
,
and all requests under /admin/apm
are mapped to the
acs-core
package directory apm-docs/admin-www
.
- If a prefix of the URL corresponds to a package key registered
with the package manager, then map to the
www
or
admin-www
directory in that package. For example, if
there's a package named address-book
, then
requests under /address-book
are mapped to the
/packages/address-book/www
directory, and requests under
/admin/address-book
are mapped to
/packages/address-book/admin-www
.
- Otherwise, just prepend the document root (usually something like
/web/arsdigita/www
) to the URL, just like AOLserver
always used to do.
Now check to see if the path refers to a directory without a trailing
slash, e.g. a request to http://www.arsdigita.com/address-book
.
If this is the case, issue a redirect to the directory with the trailing
slash, e.g. http://www.arsdigita.com/address-book/
. This is
necessary so the browser will properly resolve relative HREFs.
Next determine which particular file to serve. If our file name is
filename
, check to see if any files exist which are named
filename.*
, i.e. we try automatically adding an extension to the
file name. If the URL ends in a trailing slash, then no file name is provided
so we look for an index.*
file instead.
Give precedence to particular extensions in the order specified
by the ExtensionPrecedence
parameter, e.g. in general
prefer to serve .tcl
files rather than .adp
files.
- Call the appropriate handler for the file type.
-
If it's a TCL (
.tcl
) file, source it;
if it's an ADP (.adp
) file, parse it.
In either case, if the script or ADP built a document using the
documents API, invoke the document handler
to route the document to the appropriate master template.
-
If it's an HTML file, use
ad_serve_html
to serve it,
including a comment link as appropriate.
-
If it's a file with some extension registered with
rp_register_extension_handler
, use that handler to serve
the file. For example, if I call
rp_register_extension_handler jsp jsp_handler
then if the file is a JSP (.jsp
) file, the jsp_handler
routine will be called and expected to return a page.
-
If it's some form of static content (like a GIF, or JPEG, or anything
else besides the file types listed above), just serve the file verbatim,
guessing the MIME type from the suffix.
The request processor always returns
filter_return
, i.e. it is always solely responsible
for returning a page. Essentially it commandeers the entire AOLserver
page delivery process.
The exception is that you can specifically instruct the request
processor to leave certain URLs alone and return
filter_ok
instead, by adding parameters of the form
LeaveAloneUrl=/cgi-bin/*
to the
[ns/server/yourservername/acs/request-processor]
section
of your server .ini
file. Add more
LeaveAloneUrl=
lines for more url patterns. The patterns
use glob matching (Tcl syntax is string match $pattern $actual_url
).
API
As of ACS 3.3,
ns_register_filter
and ns_register_proc
are
dead - trying to use them causes an error. Instead of these two procedures, you'll need
to use
ad_register_filter
and
ad_register_proc
, drop-in
replacements which provide the same functionality but are integrated into the request
processor.
-
ad_register_filter when method URLpattern script [ args ... ]
ad_register_proc [ -noinherit f ] method URL procname [ args ... ]
-
Drop-in replacements for the obsoleted routines
ns_register_filter
and ns_register_proc
.
See the AOLserver
documentation for syntax.
-
ad_conn which
-
Returns information about the current connection (analagous to
ns_conn
). Allowed values for which are:
url
: same as ns_conn url
.
(In future versions of ACS with subcommunity support,
this will return the portion of the URL after any subcommunity
information. For instance, for a URL request to
/groups/Boston/address-book/
, it would return
/address-book/
.)
canonical_url
: returns a URL, minus extension,
which is consistent
no matter what URL is used to access a resource. For instance, if any
of the following URLs are used to access the address book:
Boston office:
/address-book/
/address-book/index
/address-book/index.tcl
then the canonical URL for each might be
/address-book/index
This URL is useful whenever a consistent, unique identifier for a resource is
necessary, e.g. storing URLs for general comments.
full_url
: similar to ad_conn canonical_url
,
except that the extension of the file is included. For instance, in the case of
the example above, ad_conn full_url
would return
/address-book/index.tcl
file
: returns the absolute path in the filesystem
of the file which is being delivered. If the request does not correspond
to a file (e.g. is a registered filter or procedure), returns an empty string.
extension
: returns the extension of the file
which is being delivered (equivalent to [file extension [ad_conn file]]
).
If the request does not correspond to a file (e.g. is a registered filter or procedure),
returns an empty string.
browser_id
: returns the client's browser ID
(see the document security and session-tracking).
session_id
: returns the client's session ID
(see the document security and session-tracking).
These values are set in the request-processor. Some values are
not available at various points in the page-serving process; for instance,
ad_conn file is not available in
preauth/postauth filters since path resolution is not performed
until after filters are invoked.
Parameters
[ns/server/yourservername/request-processor]
; Log lots of timestamped debugging messages?
DebugP=0
; URL sections exempt from Host header checks and security/session handling.
; (can specify an arbitrary number).
SystemURLSection=SYSTEM
; URLs that the request-processor should simply pass on to AOLserver to handle
; good candidates for these are URLs handled by nscgi.so or nsphp.so
;LeaveAloneUrl=/cgi-bin/*
;LeaveAloneUrl=/php/*
Future Improvements
Integration with subcommunities.
jsalz@mit.edu