ArsDigita Archives
 
 
   
 
spacer

ACS Request Processor

by Jon Salz

ACS Documentation : ACS Core Architecture Guide : ACS Request Processor


  • Tcl procedures: /packages/acs-core/request-processor-procs.tcl
This document describes the request processor, a series of Tcl procedures which handles every single HTTP request made to an AOLserver running ACS.

The Big Picture

In the early days of the Web, before the dawn of database-driven, dynamic content, web servers maintained a very straightforward mapping between URLs and files. In response to a request for a particular URL, servers just prepended a document root path like /web/arsdigita/www to the URL, serving up the file named by that path.

This is no longer the case: the process of responding to a request involves many more steps than simply resolving a path and delivering a file. Serving an ACS page involves (at the very least) reading security information from HTTP cookies, extracting subcommunity information from the URL, calling filters and registered procedures, invoking the abstract URL system to determine which file to serve, and then actually serving the file.

The traditional way to realize this process was to register a plethora of filters and procedures, but there were several problems with this approach:

  • It was difficult to deliver files which didn't physically live underneath the page root (in the /www directory), as in the case of pages associated with packages.
  • It was difficult to control the order in which filters were executed.
  • It was difficult to determine which code was executed for requests to which URLs.
  • If something broke, it was difficult to determine which filter was causing the problem.
  • Scoped requests needed to be handled specially, with ns_register_procs for each possible URL prefix (/groups, /some-group-type, /some-group-name, etc.).
  • Filters and registered procedures were strictly URL-based, so they broke under scoping, e.g. a procedure registered for /download/files/ wouldn't work for requests under /groups/some-group-name/download/files/62/smug.jpg.
To solve this problem, in ACS 3.3 we introduced a unified request processor implementing the series of actions above. It is written purely in Tcl as a single procedure (not a mess of ns_register_filters and ns_register_procs), allowing us a great deal of control over exactly what happens when we deliver a response to an HTTP request. We also introduced new APIs, ad_register_filter and ad_register_proc, analogous to existing AOLserver APIs (ns_register_filter and ns_register_proc) but tightly integrated into the request processor.

Steps in the Pipeline

The request processor is registered with AOLserver as a preauth filter. In fact, it is the only filter ever registered with AOLserver. We've killed off ns_register_filter and ns_register_proc - see the API below. It contains the following steps:
  1. Global initialization. Initialize the ad_conn global variable, which contains information about the connection (see ad_conn below).

  2. Library reloading. If the package manager has been instructed to reload any *-procs.tcl files, source them. Also examine any files registered to be watched (via the package manager); if any have been changed, source them as well.

  3. Developer support. Call the hook to the developer support subsystem, if it exists, to save information about the active connection.

  4. Host header checking. Check the HTTP Host header to make sure it's what we expect it to be. If the Host header is present but differs from the canonical server name (as reported by ns_info location), issue an HTTP redirect using the the correct, canonical server name.

    For instance, if someone accesses the URL http://arsdigita.com/pages/customers, we redirect them to http://www.arsdigita.com/pages/customers since the canonical host name for the server is www.arsdigita.com, not arsdigita.com.

  5. Security handling. Examine the security cookies, ad_browser_id and ad_session_id. If either is is invalid or not present at all, issue a cookie and note information about the new browser or session in the database.

  6. Examine the URL for subcommunity information. If the URL belongs to a subcommunity (e.g. /groups/Boston/address-book/ belongs to the Boston subcommunity), strip the subcommunity information from the URL and save it in the environment to be later accessed by ad_conn.

    This is not implemented in ACS 3.3.

  7. Invoke applicable filters registered with ad_register_filter. Use the URL with subcommunity stripped as the string to be matched against patterns passed to ad_register_filter, e.g. if a filter is registered on /download/files/, it will be applied on URLs like /groups/Boston/download/files/* since we stripped /groups/Boston from the URL in the step above.

  8. If an applicable procedure has been registered with ad_register_proc, invoke it. As in the previous step, match using the URL minus subcommunity information. If such a procedure is found, the process terminates here.

  9. Resolve the URL to a file in the filesystem, again ignoring subcommunity information.

    First resolve the path:

    1. If a prefix of the URL has been registered with rp_register_directory_map, map to the associated directory in the filesystem. For example, if we've called
      rp_register_directory_map "apm" "acs-core" "apm-docs"
      
      then all requests under the /apm URL stub are mapped to the acs-core package directory apm-docs/www, and all requests under /admin/apm are mapped to the acs-core package directory apm-docs/admin-www.

    2. If a prefix of the URL corresponds to a package key registered with the package manager, then map to the www or admin-www directory in that package. For example, if there's a package named address-book, then requests under /address-book are mapped to the /packages/address-book/www directory, and requests under /admin/address-book are mapped to /packages/address-book/admin-www.

    3. Otherwise, just prepend the document root (usually something like /web/arsdigita/www) to the URL, just like AOLserver always used to do.

    Now check to see if the path refers to a directory without a trailing slash, e.g. a request to http://www.arsdigita.com/address-book. If this is the case, issue a redirect to the directory with the trailing slash, e.g. http://www.arsdigita.com/address-book/. This is necessary so the browser will properly resolve relative HREFs.

    Next determine which particular file to serve. If our file name is filename, check to see if any files exist which are named filename.*, i.e. we try automatically adding an extension to the file name. If the URL ends in a trailing slash, then no file name is provided so we look for an index.* file instead. Give precedence to particular extensions in the order specified by the ExtensionPrecedence parameter, e.g. in general prefer to serve .tcl files rather than .adp files.

  10. Call the appropriate handler for the file type.

    1. If it's a TCL (.tcl) file, source it; if it's an ADP (.adp) file, parse it. In either case, if the script or ADP built a document using the documents API, invoke the document handler to route the document to the appropriate master template.

    2. If it's an HTML file, use ad_serve_html to serve it, including a comment link as appropriate.

    3. If it's a file with some extension registered with rp_register_extension_handler, use that handler to serve the file. For example, if I call
      rp_register_extension_handler jsp jsp_handler
      
      then if the file is a JSP (.jsp) file, the jsp_handler routine will be called and expected to return a page.

    4. If it's some form of static content (like a GIF, or JPEG, or anything else besides the file types listed above), just serve the file verbatim, guessing the MIME type from the suffix.
The request processor always returns filter_return, i.e. it is always solely responsible for returning a page. Essentially it commandeers the entire AOLserver page delivery process.

The exception is that you can specifically instruct the request processor to leave certain URLs alone and return filter_ok instead, by adding parameters of the form LeaveAloneUrl=/cgi-bin/* to the [ns/server/yourservername/acs/request-processor] section of your server .ini file. Add more LeaveAloneUrl= lines for more url patterns. The patterns use glob matching (Tcl syntax is string match $pattern $actual_url).

API

As of ACS 3.3, ns_register_filter and ns_register_proc are dead - trying to use them causes an error. Instead of these two procedures, you'll need to use ad_register_filter and ad_register_proc, drop-in replacements which provide the same functionality but are integrated into the request processor.
ad_register_filter when method URLpattern script [ args ... ]
ad_register_proc [ -noinherit f ] method URL procname [ args ... ]
Drop-in replacements for the obsoleted routines ns_register_filter and ns_register_proc. See the AOLserver documentation for syntax.

ad_conn which
Returns information about the current connection (analagous to ns_conn). Allowed values for which are:

  • url: same as ns_conn url. (In future versions of ACS with subcommunity support, this will return the portion of the URL after any subcommunity information. For instance, for a URL request to /groups/Boston/address-book/, it would return /address-book/.)

  • canonical_url: returns a URL, minus extension, which is consistent no matter what URL is used to access a resource. For instance, if any of the following URLs are used to access the address book: Boston office:
    /address-book/
    /address-book/index
    /address-book/index.tcl
    
    then the canonical URL for each might be
    /address-book/index
    This URL is useful whenever a consistent, unique identifier for a resource is necessary, e.g. storing URLs for general comments.

  • full_url: similar to ad_conn canonical_url, except that the extension of the file is included. For instance, in the case of the example above, ad_conn full_url would return
    /address-book/index.tcl
  • file: returns the absolute path in the filesystem of the file which is being delivered. If the request does not correspond to a file (e.g. is a registered filter or procedure), returns an empty string.

  • extension: returns the extension of the file which is being delivered (equivalent to [file extension [ad_conn file]]). If the request does not correspond to a file (e.g. is a registered filter or procedure), returns an empty string.

  • browser_id: returns the client's browser ID (see the document security and session-tracking).

  • session_id: returns the client's session ID (see the document security and session-tracking).

These values are set in the request-processor. Some values are not available at various points in the page-serving process; for instance, ad_conn file is not available in preauth/postauth filters since path resolution is not performed until after filters are invoked.

Parameters

[ns/server/yourservername/request-processor]
; Log lots of timestamped debugging messages?
DebugP=0
; URL sections exempt from Host header checks and security/session handling.
; (can specify an arbitrary number).
SystemURLSection=SYSTEM
; URLs that the request-processor should simply pass on to AOLserver to handle
; good candidates for these are URLs handled by nscgi.so or nsphp.so
;LeaveAloneUrl=/cgi-bin/*
;LeaveAloneUrl=/php/*

Future Improvements

Integration with subcommunities.


jsalz@mit.edu
spacer