Developers Guide

to the ArsDigita Community System by Philip Greenspun

This document contains general guidelines for building new ACS pages. It should used in conjunction with the procedure-by-procedure documentation.

Writing a Whole Module

If you're writing a new module, see the short instructions in /arsdigita/doc/custom, and a detailed guide in /arsdigita/doc/writing-a-module.

Documentation

Every procedure that you expect to be called externally (i.e., by Tcl code that doesn't reside in the same file) should be documented with a doc string. Instead of using proc, you should use proc_doc. This results in the procedure-by-procedure documentation. If you're a new programmer, you might want to read the common errors list.

You call proc_doc with a string in between the args and the body:

proc_doc plus2 {x} "returns the result of adding 2 to its argument" {
    return [expr $x + 2]
}

Magic Numbers in the Code (Parameters)

Don't put magic numbers in your code with site-specific stuff, e.g., whether a particular feature is enabled or disabled. Add a parameter or a section, if necessary, to your ad.ini file in the /parameters directory. Instead of


proc bboard_users_can_add_topics_p {} {
    return 0
}


[ns/server/photonet/acs/bboard]
; can a user start a new bboard
UserCanAddTopicsP=0
...

proc bboard_users_can_add_topics_p {} {
    return [ad_parameter UserCanAddTopicsP bboard]
}

Naming

If it is a parameter, put it in /web/yourdomain/parameters/yourdomain.ini and capitalize words, e.g., "SystemName" (for consistency with AOLserver .ini file).

If your define a Tcl procedure that is site-specific, name it with a prefix that is site-specific. E.g., the EDF scorecard.org site uses "score_". A site to sell khakis uses "k_".

If it is a community procedure, name it ad_ and put in somewhere in the /tcl directory.

If it is a utility procedure, name it util_ and put it in the /home/nsadmin/modules/tcl/utilities file.

Naming (files)

In general, we like to name sequences of forms in the following way:

foo (or foo.adp) -- presents a form to the user
foo-2 (or foo-2.adp) -- presents a confirmation page to the user
foo-3 (or foo-3.adp) -- actually does the database transaction, may redirect rather than present anything to the user

As far as the "foo" goes (the actual name of the file), we like to use object-verb. So you might have "user-update" for a form that updates a record in the users table. Try not to be redundant with the directory name. So if you have a bunch of scripts in a directory called "users", the script to look at one user would just be "one" rather than "user".

Naming (columns)

The following column names are ArsDigita Community System standards.

creation_date - date row was created
creation_user - user_id of the creator
creation_ip_address - ip address of the creator
last_modified - date row was last modified
last_modifying_user - user_id of the user last modifying the row
modified_ip_address - ip address of the last modifying user
html_p (t,f) - is the text in html?
approved_p (t,f) - is this row approved?

Style Guide

Dynamically generated pages should always be signed with an email address. This should be the correct address for a user complaining that a page does not contain the correct content. Thus in general the correct footer for a user page will be either ad_footer or gc_footer or calendar_footer, etc. Admin pages should end with ad_admin_footer. Then if the webmaster encounters a bug or a page that doesn't do what is needed, he or she can complain to a programmer.

If we're building a system where we can't get any better theories from the publisher, we design pages to have the following structure:

title
context bar (Yahoo-style navigation)
HR
out-of-flow options such as help or admin (aligned off to the right, just underneath the HR)
the meat of the page
navigation and "more info" options
HR
email address signature

What about smaller style issues? Here are some general principles we've developed so far at ArsDigita:

don't smash stuff against the left margin of the page (hard to read, esp. on screens crowded with windows); use BLOCKQUOTE or UL/LI tags to put a white border between the left edge of the browser and the content on the page. Note that this also applies to tables of info. BLOCKQUOTE then TABLE.
try to have no more than one form button per page and certainly no more than one per form. Some sub-issues:
- you will never have a RESET or CANCEL button. A user who mis-mouses should not lose typed-in data. If they change their mind, let them back up, navigate away via the context bar, or reload.
- a SUBMIT button should never be called "submit". It should say something like "Create Account" or "Proceed" (if a multi-form pipeline) or "Search"
- the submit button should be centered within the page, underneath the table of form inputs
speaking of forms, if you have hidden or pass-through variables, put them right up at the top of the form immediately underneath the FORM tag.

Using the Database Intelligently

Virtually every page on every site that ArsDigita has ever built (1) is generated by a computer program, (2) has access to the relational database that sits behind the site. Take advantage of these facts.

How? Follow two principles:

Show users as much information as quickly as possible; don't make them click down
Don't offer users dead links.

These sound obvious but most programmers' instincts are to produce pages that behave like static files. For example, in a bond trading site the top-level page might offer links to "portfolio, trading, and open orders". This could have been done with a file! Instead, why not query Oracle to find out the total value of the portfolio (show users as much info as possible)? Or query Oracle to find out if there are any open orders; if there are just a few, display them in-line, otherwise don't have a hyperlink anchored with "open orders" (show as much info as possible; don't offer dead links).

In a photography classifieds page, don't show categories that haven't any current ads (no dead links) and count up the ads in each category for display next to the link (as much info as possible). Isn't this GROUP BY that sequentially scans the classifieds table kind of expensive for a top-level page on a non-commercial site? Sure. But the solution is to use Memoize_for_Awhile to cache results in virtual memory.

Computer time is cheap; user time is precious. Work the server hard on behalf of each and every user. Support the user with personalization. Find out what is going to be down a hyperlink before offering it to the user. Buy extra processors as the community grows.

Give the User Dimensional Controls

See the /ticket module for how a large body of data may be rendered manageable by giving the user several dimensions along which to select.

Pages that accept user input

Pages that accept user input should first call ad_read_only_p to make sure that the Oracle database isn't being maintained in such a way that updates would be lost. Right at the top of a file that offers the user a form or stuff something into the db, put

if {[ad_read_only_p]} {
    ad_return_read_only_maintenance_message
    return
}

Systems should be designed so that a user clicking submit twice will not result in a duplicate database entry. The fix is that we generate the unique primary key in the form or the approval page (better since it will still work if the user reuses the form). See the ecommerce chapter of my book for a discussion of how this works. See the news subsystem for a simple implementation example.

Systems should be designed so that they do something sensible with plain text and HTML. Add an "html_p" column to any table that accepts user input. Store the user input in unadulterated form in the database. Convert it to HTML on the fly if necessary when displaying (this consists of guessing where to stick in <P> tags and quoting greater-than or less-than signs). See the news subsystem for an example.

Pages that are broken

It is nice to email the host admin if things are wrong, but don't do it directly with ns_sendmail; use ad_notify_host_administrator (defined in /tcl/ad-monitor ). This way, the host admin won't get email more than once every 15 minutes.

Adding an item when some are present

Suppose that you have a page that has to (1) show a list of some items, (2) offer the user the option of adding a new item of the same type as the items being displayed.

Our convention in the ACS is to present the existing items in a list (UL). Then we have a blank line (P tag). Then we have a new list item (LI) with a hyperlinked phrase like "add new item".

Content that can accept comments

In general, the purpose of Web content is to attract comments (see Chapter 1 of my book if you aren't convinced). That means whenever you're developing a new application within ACS you should give users the ability to contribute comments. This is already a system feature for static pages and the discussion forum. For miscellaneous areas, such as news and calendar, use this table from general_comments.sql:

create table general_comments (
	comment_id		integer primary key,
	on_what_id		integer not null,
	on_which_table		varchar(50),
	user_id			not null references users,
	comment_date		date,
	ip_address		varchar(50) not null,
	modified_date		date,
	content			clob,
	approved_p		char(1) default 't' check(approved_p in ('t','f'))
);

Note that it points to other tables via the on_what_id and on_which_table columns.

Email Alerts

To avoid sending undesired email, use the users_alertable view instead of the users table as a selection pool for generating alerts:

create or replace view users_alertable
as
select * 
 from users 
 where (on_vacation_until is null or 
        on_vacation_until < sysdate)
 and (deleted_p is null or deleted_p = 'f')
 and (email_bouncing_p is null or email_bouncing_p = 'f');

Distributing Maintenance

When distributing maintenance responsibility, use the permissions package. See the /gc/ module for an example of how this package may be used.

Auditing

Often a table will need to be audited, particularly if your client has hired a number data entry people who are prone to make mistakes. Auditing a table allows you to see who made changes, when the changes were made, what changed, and the history of all states the data have been in (so the data are never lost).

Auditing a table consists of:

Adding a few columns to the table to record the time of update and the identity of the person making the update.
Creating a separate audit table that contains all old versions of the data in the table.
Creating a trigger which automatically stuffs a row into the audit table whenever the original table is modified.

The ACS has a number of auditing conventions which you should follow, as well as some utility procedures which can be used to display the history of all states a table (or set of tables) has been in. This is documented in the Audit Trail Package.

Adding Graphics

If you want to add graphics to your site without performing major surgery, the easiest thing to do is add illustrations. That is, put in pictures and drawings to give users a feeling of place. Avoid making graphics and buttons part of the user interface. It will make the site hard to use for people on slow links. It will make it harder to maintain the code. It will make it harder to offer a text-only version of the site.

The two places on photo.net where there are decorations like this are up in the headline (turning it into an HTML table) and also alongside lists of stuff. Procedures that support this are the following:

ad_decorate_side (in /tcl/ad-sidegraphics)
ad_decorate_top (in /tcl/ad-defs)

See http://photo.net/bboard/ for a demonstration of both in use.

Categorization

Suppose that you want to let each user manage a collection of items on the server. For example, at jobdirect.com, an employer searches among tens of thousands of student resumes and can pick out especially promising students to save for later scrutiny. You know that at least some users will pick hundreds of students via this mechanism and will need some way to organize them. However, new employer-users won't have any students on their "favorites" list and it is unnecessary to expose them to categorization machinery as they pick their first favorite student. Suppose that your categorization solution for power users is one layer of folders. When Joe Employer picks his first favorite student, should he really be hammered with a message: "You haven't set up any folders for favorite students yet. Please set up a folder first and then you can pick favorite students."

Remember Alan Cooper's adage that "No matter how cool your user interface, it would be better if there were less of it."

We applied this principle on jobdirect.com by suppressing the categorization machinery until the employer-user had picked at least 8 students. Categorization then appeared as an option when the user was viewing his or her list of favorite students (presumably this is the only time when the user might have been thinking "hey, this list is getting long"). Once the user had elected to switch over to the more complex categorization interface, future picks of favorite students would result in messages like "Oh, into which folder would you like us to put this resume?"

For the advanced user, given that you're going to have categorization you might ask how much is needed. Users are familiar with the hierarchical directory structures in the Windows and Macintosh file systems. Or are they? Hierarchical file systems were lifted from the operating systems of the 1960s and pushed directly into consumer's laps without anyone asking the question "Are desktop users in fact able to make effective use of this interface directly?" The programmers who built file systems needed an O(log n) retrieval method for files. A tree data structure yields O(log n) retrieval, so a file system has an underlying hierarchical structure. The programmers were too lazy to develop any kind of categorization or database scheme on top of the hierarchical tree so they just exposed the tree structure directly to users. So let's not invest too much authority in tree-structured file systems.

Even if they have painfully learned to manage a hierarchy of files on their desktop, do users want to manage another hierarchy on each Web service that they use?

Do we need elaborate hierarchies? Consider the user who has 1000 items to manage but is very likely to want to work on the 20 selected or uploaded in the last month. Does this user need to wade through 1000 listings to find the 20 most recent? No, not if we provide a "sort by most recent" option. Then the user can simply look at the top of the page and not scroll down too much.

Can we survive with only one level of hierarchy? I think so. Especially if

you provide ways to sort by creation or modification date within a category
you provide multiple hierarchies (e.g., by project, by subject)

Searching -- is scoring better than a naive SQL query?

Suppose that you're faced with the task of letting the user search through some data. One way to go about this is to give the user the ability to type SQL queries.

User typing SQL queries?!?!? Am I insane? How could a random Web surfer be expected to master the profundities of SQL syntax?

Thus the average Web developer will typically build an HTML form to shield the user from the complexity of SQL while retaining all the power of SQL. This form will have one input for every column in the table, perhaps with some ability for a user to specify operators (e.g., "less than", "equal to", "starting with"). The form will have a select box or radio button set where the user can decide whether he wants to AND or OR the criteria.

This approach shields the user from the trivial syntactic complexity of SQL but directly exposes the far more brain-numbing semantic complexities of SQL in general and the publisher's data model in particular.

Bottom Line Principle 1: the first search form that your user sees ought to be a single text entry box, just like AltaVista's. The results page can explain how the results were obtained and perhaps offer a link to an advanced search form at the bottom (on the presumption that the user has scanned all the results and found them inadequate).

Let's now consider the case of the user who fills out a multi-input search form or types a long phrase into a text search box. I.e., the user has given the server lots of information about his or her interests. What is this user's reward? Generally fewer results than would be delivered to a user who only provided one query word or filled in one field in the moby search form. Compare this to AltaVista, Lycos, and other full-text search systems that people use every day. The more words a user gives a public search engine, the more results are returned (though oftentimes only the first 20 or 30 are displayed).

Bottom Line Principle 2: the more information a user gives to your server the more results your server should offer to the user.

This principle seems dangerous in practice. What if the user types so many words that essentially every item in the database is a match? Wouldn't it be better to offer an advanced search form that lets the user limit results explicitly.

Very seldom. Users are terrible at formulating boolean queries. Most often, they'll come up with a query that matches every row in your database or a query that matches none. You really shouldn't engineer software so that it is possible for the server to return a page saying "Your query returned zero results."

What's the way out? Suppose that you could score every row in the database again the user's criteria. It would then be perfectly acceptable to return every row in the database, ranked by descending score. The user need only look at the top of the page and may ignore the less relevant results.

Is this a radical idea? Hardly. All the public search engines use it. They may return tens of thousands of results if a user supplies a long query string but the most relevant ones are printed first.

Bottom Line Principle 3: Scoring and ranking and returning the top scoring items is a much better user interface than forcing the user into a simplistic binary in/out.

Suppose that your users are giving you criteria that are more structured than free text. What's a good user interface? On the search form, ask for preferences but provide checkboxes to "absolutely exclude items that don't meet this criterium". On the results page, print items as follows:

Items that meet all your criteria

98: foobar
92: yow

...
Items that meet some of your criteria

83: blatzco
83: bard
82: cookie monster

...

Warning signs that you don't know SQL

Most Web programmers suffer from the delusion that they know SQL and understand Oracle. This delusion stems from the euphoria of getting a Web page to work. In reality, most Web programmers are very weak SQL developers and the only things that save them are the incredible speed of modern computers and the relative paucity of traffic on most Web sites.

Here are some warning signs that you need to get help from a real SQL programmer:

you've built a page that uses lock table
you query Oracle for N things and then use Tcl code to filter out some that don't fit your criteria for display. I.e., you don't use all of the data that you query from Oracle. SQL is a very powerful query language and, supplemented on occasion with PL/SQL or Java inside the database, it is always possible to do your filtering inside Oracle rather than dragging data across SQL*Net to filter in Tcl.
you've built a page that queries Oracle for a list of stuff and then, for each thing in the list, goes back to Oracle with another query. So if you had 1000 things on the list, you'd go to Oracle a total of 1001 times for this page. This kind of page can almost always be slimmed down to 1 single query with an outer JOIN and GROUP BY. You might need to JOIN against an on-the-fly view. In the worst case you might need a PL/SQL procedure.
you've gone into SQL*Plus and set timing on and set autotrace trace and find that some of your queries are taking more than a fraction of a second and/or requiring full table scans. Online systems should try to get everything done within 1/10th of a second. Remember that if your page takes 1/10th of a second you can only serve 10 pages/second per processor.

Sharing data among threads

The AOLserver ns_share construct is very slow in the Tcl 8.2 version of AOLserver. We recommend the use of the much more powerful nsv facility, documented in README-NSV.txt.

Filters

Use of ns_register_filter is deprecated as of ACS 3.2 - it's been replaced with ad_register_filter, a drop-in replacement which supports some extra flags. ad_register_filter provides the following functionality:

Priorities - filters are executed in order of priority (lowest number to highest). For instance, the security filter (which must be run before anything else) has a priority of 1, compared to the default of 10000.
Monitoring - using /admin/monitoring/filters you can see which filters will run for any given request.
Debugging - invocations of the filter can be logged.
Error recovery - if a non-critical filter throws an error, subsequent filters will still be run (AOLserver's default behavior is to terminate the connection if any filter fails).

To provide this extra flexibility, ACS actually registers a single "über-filter" with AOLserver and handles filtering itself (in ad_handle_filter).

You can use Perl to change all your legacy code to use ad_register_filter:

perl -pi -e 's/ns_register_filter/ad_register_filter/g' files-to-process...

Scheduled Processes

ns_schedule_proc is also deprecated as of ACS 3.2 - use ad_schedule_proc instead. ad_schedule_proc is almost a drop-in replacement (the syntax for flags is slightly different, as the -thread and -once switches require argumens - see the documentation. Using ad_schedule_proc lets you track which scheduled procedures are about to be run and when (view /admin/monitoring/schedule-procs).

You can use Perl to change all your legacy code to use ad_schedule_proc:

perl -pi -e 's/ns_schedule_proc( -\w+)?/"ad_schedule_proc$1".($1 ? " t" : "")/eg;'` files-to-process...

This adds the necessary t after the -thread or -once flag (e.g., converts ns_schedule_proc -once to ad_schedule_proc -once t).

philg@mit.edu