Chat
part of the
ArsDigita Community System
by
Philip Greenspun
I. Essentials
II. Introduction
We have our own chat server because:
- we want our entire system to be integrated, i.e. one users table means one
comprehensive view of what a user has done on our site;
- we want to be able to create and drop chat rooms in conjunction with
the creation or dropping of user groups or other publisher-specific
database structures, e.g., the creation of a class by a teacher. The
only practical way to do this is if a chat room is really an Oracle
table construct of some sort. Otherwise, we can't do this as a
transaction. It's unclear just how close we can come to accomplishing this
objective if we go with separate tables per chat room since CREATE TABLE
and DROP TABLE are not able to be rolled back;
- we want something that can be extended and maintained by any
ArsDigita programmer, i.e., something that requires only AOLserver Tcl
and Oracle skills;
- commercial chat servers tend to be unreliable and expensive to
maintain. They sometimes bring down entire Solaris machines.
A potential limitation of our system is that Oracle's consistency and
durability guarantees are expensive. If we had 50 submissions per
second and 1000 queries for chatroom updates per second, we would need a
huge Unix machine. In theory, a custom-built chat server ought to
be capable of higher performance for a given piece of hardware. In
practice, the commercial systems aren't programmed properly and they
crash. They also become unreliable and slow when, for example, the number
of chat rooms is large.
III. Historical Considerations
Why is a chat server useful? As traditionally conceived, it isn't. The
Internet is good at coordinating people who are separated in space
and time. If a bunch of folks could all agree to meet at a
specific time, the telephone would probably be a better way to support
their interaction.
The way that commercial chat server vendors pitch their products these
days is for customer service. Joe User is presumed to be using a
computer and his one phone line. So he can't just pick up the phone and
call them because he'd have to drop his Internet connection. Also, a
customer service person can handle four or five sessions at once if done
via something like AOL Instant Messenger rather than via the telephone.
IV. Competitive Analysis
So why can't companies that want to do 1:1 conversations just use
AOL
Instant Messenger (AIM)? AOL Instant Messenger works best with
some
additional software to be installed on the user's machine. This is free
software and it shipped with Netscape 4.x and is certainly available
to AOL service customers. But it is not universal and you can't be
guaranteed that because someone is connecting to your Web site that they
have the AIM client. An AIM-compatible Java applet is available. Since
a Java applet can only connect to the server from which it was
downloaded, you must reference this by including a link to
http://toc.oscar.aol.com
(the first thing this page does is execute some JavaScript that bounces
your browser back one page, so from the user's point of view it looks
like the Java client just popped up and they never left the real
publisher's page).
A second limitation to the "just use AIM" approach for customer service
is that AIM doesn't provide convenient canned responses. In theory, you
might be able to come up with 10 or 15 canned responses that would cover
50% of the inquiries. Then the same staff can handle more customers.
A third limitation to the "just use AIM" approach is that you can't have
rich content, e.g., in-line images, because AIM is not a Web browser and
can't render HTML.
A fourth reasons that AIM isn't adequate is that it is tough to measure
the effectiveness of your staff. The conversations aren't centrally
logged (though your staff could save them individually to disk). The
conversations aren't tied to your users table.
A good example of a company that has addressed many of these issues is
liveperson.com. For $250 per
customer service agent per month (prices as of April 1999), they will
just do everything for you on their server farm. There are a few
problems with LivePerson:
- The information in their database isn't automatically synchronized with
the information in your database; you'll have two users tables, one on
their server and one on your server.
- They don't solve the "public chat room" problem for those who wish
to have this; LivePerson is limited to customer service and is best
thought of as "a more specialized AIM".
V. Design Tradeoffs
Archive or not?
We have to drive the entire system design from a publishing decision:
are we interested in seeing archives of chat sessions? If we are, then
the one-table structure makes a lot more sense. We'll want to do a
single SQL query to see everything that a user has contributed. We'll
want to do a single SQL query to see how a customer service person is
doing.
A disadvantage of archiving is that it chews up disk space. Suppose
we're America Online and 1 million subscribers chat or AIM every day.
Let's further assume that each person types 50 rows of stuff, 100 bytes
each. That means our table will grow by 50 million rows and 5 GB every
day. After 20 days, we'll begin to bump up against the billion-row
table size that data warehouse experts suggest as a practical limit.
Does that mean we scrap our simple one-table system? Not yet.
Designing systems for the highest volume sites is gratifying in a nerd
ego sense, but it is a mistake if it leads to fewer features for the
user and the publisher. Remember that the future is not mass media but
personalized content and lots of interesting mid-sized communities.
Let's reconsider a community like photo.net where the public bboards get
no more than 2000 new messages per day, each one an average of 650
characters. That's an average of 1.3 Mbytes of new content every day,
implying 8000 days before a 9 GB hard drive is filled up and 500,000
days before we've built a billion-row table. Obviously this isn't going
to be a problem to archive indefinitely.
Let's look at the photo.net community another way. We have about 25,000
daily visits. Suppose that each of those 25,000 people used a photo.net
chat or instant messaging service to send as much communication as they
send via email. Assume 100 messages per day and 200 bytes per message
and all 25,000 people participating. That's 0.5 Gbytes per day. We
fill up a modern (April 1999) 36 GB disk drive after two months.
So it seems that on popular public sites we won't be able to store
everything that users type. At the same time, a real customer's
interaction with a real customer service person ought to be archived
forever (so that you can ask questions like "show me how many users who
talked to Kathy eventually bought an item").
VI. Data Model Discussion
One Table or Many Tables?
We've had good luck with the /bboard system since 1995. This uses one
table to store all the messages, one message per row with a column for
topic
. This has the virtue of simplicity. It also has the
virtue of cleanliness in that all the identically structured data in the
system is in a single table. This has the virtue of easy searchability
since Oracle is designed to build indices on one table at a time. This
has the virtue of transactionality for creation of bboard topics and
deletion of bboard topics; no tables are created or dropped when topics
are created or dropped.
A bad thing about the one-table structure is fragmentation and lack of
recovery of disk space when postings are deleted. For example, suppose
that you drop an entire bboard topic with 2000 archived messages. This
will result in 2000 random deletions from perhaps 1000 Oracle blocks.
The table won't shrink any, i.e., it will still consume just as many
disk blocks. The free space might not even be used on subsequent
inserts, depending on what percentage of the block is now free.
According to Oracle8 Tuning, page 146, index entries for
deleted rows aren't reclaimed until you do an alter index
***index_name*** rebuild
. In practice, it seems that the
bboard
table on photo.net hasn't suffered too badly from
this problem (after three months).
Oracle has b-tree indices that are a maximum of four levels deep
(header, two intermediate levels, leaf nodes). So you don't get
O(log n) access time through an index if the table has an
outrageous number of rows. The most important thing that we'll want to
do is query by chat room key and date-time. If we were to build a
concatenated index on these values, we'd probably have the header block
taken up with all the chat room names. Then the next level would be
ranges of times for a particular chat room. Then the third level could
be more ranges of times. Then the leaf nodes would point to the rowids
for specific dates. This could be pretty darn selective if Oracle is
smart about building the index.
How do we accomplish pruning and tuning?
If we can be sure that we always have at least twice as much disk space
as the chat that we want saved, we can do the following:
- assume the live chat table is
chat_msgs
- create a table called
chat_msgs_new
- select all the stuff we want to save from
chat_msgs
and
insert it into chat_msgs_new
- drop any integrity constraints that reference
chat_msgs
drop table chat_msgs
alter table chat_msgs_new rename to chat_msgs
- rebuild indices on
chat_msgs
- add any integrity constraints that reference
chat_msgs
This is a pretty risky operation and we'd want a PL/SQL program to do it
rather than rely on a human dba. Chat could be down for as much as an
hour so we'd want to do it on an early Sunday morning at the beginning
of each month (or something similar). We'll need to develop the Tcl
scripts so that they can say "Chat server is being maintained right now;
try back in one hour". The pruning/tuning should be done by an
AOLserver ns_schedule_proc that (1) sets the maintenance flag, (2)
executes the PL/SQL proc, (3) resets the maintenance flag.
Following this operation, the chat table will be as compact as possible.
VII. Legal Transactions
VIII. API
IX. User Interface
Types of chat we need to support
Public chat rooms. These are open to everyone in the
users_active
view. For moderation, we check perms using
the permissions package (where module =
"chat" and submodule = **chat_room_id**).
Private chat rooms. These are open to people in particular user
groups. We check perms using the permissions
package.
For either kind of chat room, we should support moderated chat. That is, a
posting doesn't go live until it has been approved by someone who has
the "moderator" or "administrator" role in user group associate with a
private chat room or, in the case of a public chat room, by someone who
is a member of the appropriate chat moderation group.
We want to support 1:1 messages for customer support, if nothing else.
We need one layer on top of this to make sure that users can find an
appropriate chat partner. For example, if Bill User says that he needs
support for his widget, the system has to find the least busy authorized
widget support person and start a 1:1 chat session between Bill and that
person.
For public community sites where nothing is being sold or supported, a
publisher might wish to limit the load on the server from all of this
1:1 chatting. In that case, we set an ad.ini file parameter to just
bounce users over to the AOL Instant Messenger infrastructure.
Options for the publisher
Some options are configurable per-room, e.g.,
- Is a room moderated?
- If so, by whom?
- Is a room restricted to users who are members of a particular group?
- Should messages expire after a certain number of days?
The per-system options are configurable in the ad.ini file. The big
items:
- can users create their own rooms?
- do you want the posting form on the top of the page and the most
recent messages at the top (this is better for users; they won't have to
scroll down after a refresh to see if there are new messages); or do you
want the messages to run down chronologically? (Note that theglobe.com
and other popular chat systems seem to use the "new messages on top" style.)
- do you want to offer users the option of using the system to send
private messages to each other (not very well supported in version 1.5)?
- do you offer users the ability to see the complete history of a chat
room or is it "use it or lose it" (Our default is to provide this but
note that many commercial chat systems do not provide history, at least
not to the users)
- exactly how many messages should be displayed when users get a chat
page (starts at "short" and if they click "more messages" they can
graduate to "medium" or "long")
- add a photograph or other graphic in the top left corner of all chat
pages?
Linking into the system from a static page
If you want to link into the chat system from a static page elsewhere on
your site, do it by linking to "enter-room" rather than the more
obvious "chat". That way other users will see people coming in.
<a href="/chat/enter-room?chat_room_id=142">Chat</a>
Ensuring high performance
Inevitably a system like this will require some polling, either by Java
clients, HTTP Refresh headers, or JavaScript. If we have 100 people in
a chat room and they are all polling every 2 seconds, we don't want to
have to buy an 8-CPU computer to support 50 queries per second right
into Oracle. So we make heavy use of
util_memoize
.
When a new posting is made, we force a cache update with
util_memoize_flush
.
More than 95% of the time, a user is getting
results from AOLserver's virtual memory and not from the database.
Why the HTML version can't autorefresh
The HTML page cannot have a Refresh: header for client-pull autofresh.
If you did this, the user would be at risk of losing what he or she was
typing into the post form.
If you care about database performance
An active chat server is going to result in a fair number of Oracle
transactions. You'll at least want to keep chat tables on a new
separate physical disk drive. You'll want indices on those tables to
reside on yet another new disk drive. Since every disk drive on a 24x7
server must be mirrored, that means you need four new disk drives to
implement this module.
Practical Experience from photo.net
We tested the service on photo.net. Here's a transcript excerpt from
the first day:
Justin (06:42:14) Anyone there?
Justin (06:46:41) alright...I'll assume everyone is asleep. Sleep well. Goodnight.
Tommy (06:55:33) anyone here?
Tommy (06:58:09) so this is supposed to be a camera shopping chat room...
Justin (07:27:04) I'm here.
Justin (07:28:25) But...I guess you were here...like a half hour ago, so I'm going to take my
circadain rhythm problem to the donut shop.
Lim (09:59:23) hi
Lim (09:59:51) anybody around?
Eve (10:19:10) Hi there.
Eve (10:24:14) anybody around? Are you still there Lim?
Kind of makes you feel that all those long hours spent programming and
maintaining Unix and Oracle were worthwhile...
X. Configuration/Parameters
[ns/server/yourservername/acs/chat]
EnabledP=1
; SystemName=Chat
; how long to cache the postings to a room (updates force a cache update
; so this theoretically could be 5 hours or whatever)
CacheTimeout=120
; how long will a room's properties (e.g., private group, moderation) be cached
RoomPropertiesCacheTimeout=600
UsersCanCreateRoomsP=0
; set to 1 if you want most recent postings on top; this is the way that
; theglobe.com and other familiar chat systems do it (keeps users from
; having to scroll to see new msgs)
MostRecentOnTopP=1
; do we want to offer users the option of sending private messages?
PrivateChatEnabledP=0
; do we offer users a link to a chat room's history?
ExposeChatHistoryP=1
; how many messages to display when users choose short medium or long
NShortMessages=25
NMediumMessages=50
NLongMessages=75
; show a picture at the index page and in individual rooms?
DefaultDecoration=<a href="/photo/pcd0865/rachel-mouth-3"><img HEIGHT=134 WIDTH=196 src="/photo/pcd0865/rachel-mouth-3.1.jpg" ALT="Mouth."></a>
; how often the javascript version should refresh itself
JavaScriptRefreshInterval=5
XI. Acceptance Tests
- Go to /admin/chat/ and create a public chat room
- Go to /chat/ and enter the room
- Log in to /chat/ as a different user on a different browser
- Open up the javascript version on both browsers and have a nice chat
- Delete all the messages from the room
- Delete the room
XII. Future Improvements/Areas of Likely Change
The chat module will be upgraded as a package to ACS 4.
XIII. Authors
philg@mit.edu