Webmail Design Document
by Erik Bielefeldt (adapted from the original by Jin Choi)
I. Essentials
When applicable, each of the following items should receive its own link:
II. Introduction
Email handlers are among the first user-level programs written for any
new operating system, and are one of the few core tools that almost
anyone using a computer will use on a regular basis. Most recently, we
have seen a blossoming of Web-based email systems such as Hotmail and
Yahoo Mail. Why build yet another mail system?
Some of the desirable traits of a mail system are:
- Centralized storage. Users should see the same email history every
time they check email, no matter which computer or email reader they happen
to be using.
- Reliability. Email is important. A disk failure or a negligent
sysadmin should not be a cause for losing it. The mail server should
always be running, and well-connected to the internet.
- Availability. Email should be readable wherever you are.
- Completeness and correctness. An email reader should be able to
receive, display, and send attachments. Any message it sends should be
standards-conforming. Because many other systems are not, it should be
able to handle common deviations from the standard.
The webmail module addresses the first three traits (the last is a
work in progress). These requirements argue for the primary message
store to remain on a well-administered server. These are the same
needs addressed by the designers of IMAP. IMAP solves all these issues
except for one: availability; an IMAP client isn't always installed on
every computer with a net connection, whereas a Web browser almost
always is. But a Web browser is a less-than-ideal interface for
reading email when compared to an all-singing, all-dancing mail
client. Thus, the ideal mail solution is an IMAP server with a web
interface that accesses the same message store as the IMAP client.
III. Historical Considerations
Mail systems with this architecture already exist. Oracle provides
software that does what the webmail module does, and probably
more. CriticalPath is a company that provides out-sourced email with
IMAP and web front ends. These may be better than
webmail. CriticalPath certainly has the advantage that it requires no
effort on the part of the user other than sending them a check every
once in a while. However, when I used CriticalPath, it was unreachable
or unuseably slow about half the time (usually due to network problems
to their server). I ran out of patience attempting to install Oracle
Email Server.
IV. Competitive Analysis
The downside to these other systems is lack of control. It is
difficult to modify the look or extend the features of an email server
without access to the source code. In the case of CriticalPath, you
are stuck with what they provide, and cannot integrate it to provide
web-based email as a seamless service of your web site. If you are
using the ArsDigita Community System, webmail provides the core of a
web-based email system that relies on proven, reliable systems to do
all of the hard work and that is simple to extend. If you are not
using the ACS, then perhaps studying an implementation of a working
system will aid you in building something suitable for your own needs.
V. Design Tradeoffs
VI. Data Model Discussion
The following section will step through the data model, discussing important or interesting aspects.
-- Domains for which we receive email.
create table wm_domains (
-- short text key
short_name varchar(100) not null primary key,
-- fully qualified domain name
full_domain_name varchar(100) not null
);
The
wm_domains
table contains the domains for which we expect
to receive mail. The
short_name
field stores the text
key used to differentiate mail to different domains as discussed
above. Qmail must be configured to handle email for these domains
manually.
-- Maps email accounts to ACS users.
create table wm_email_user_map (
email_user_name varchar(100) not null,
domain references wm_domains,
user_id not null references users,
primary key (email_user_name, domain, user_id)
);
wm_email_user_map
assigns email addresses to ACS users.
Why not just use the
email
column in the
users
table? This approach lets each ACS user receive
email for more than one email address and also permits flexibility on
which address is published to other registered users. As a row is
inserted into this table, the appropriate .qmail alias files are
created for that user.
-- Main mail message table. Stores body of the email, along
-- with a parsed text version with markers for attachments for MIME
-- messages.
create sequence wm_msg_id_sequence;
create table wm_messages (
msg_id integer primary key,
body clob,
-- plain text portions of MIME message; empty if
-- entire message is of type text/*.
mime_text clob,
message_id varchar(500), -- RFC822 Message-ID field
unique_id integer -- for both POP3 UIDL and IMAP UID
);
create index wm_messages_by_message_id on wm_messages(message_id);
This is the primary message table. It stores the body of the message,
a parsed plain-text version with markers for attachments if this is a
multipart MIME message, a denormalized Message-ID field for easy
reference by Message ID, and yet another ID field for IMAP
bookkeeping. The message_id field is not unique, since the same
message may have been received multiple times.
-- Stores attachments for MIME messages.
create table wm_attachments (
msg_id not null references wm_messages,
-- File name associated with attachment.
filename varchar(600) not null,
-- MIME type of attachment.
content_type varchar(100),
data blob,
format varchar(10) check (format in ('binary', 'text')), -- for interMedia INSO filter
primary key (msg_id, filename)
);
This table stores MIME attachments and associated information. The
format field enables the use of the interMedia INSO filters to search
encoded files, such as Word or PDF documents.
-- Maps mailboxes (folders, in more common terminology) to ACS users.
create sequence wm_mailbox_id_sequence;
create table wm_mailboxes (
mailbox_id integer primary key,
name varchar(100) not null,
creation_user references users(user_id),
creation_date date,
uid_validity integer, -- Needed for IMAP
unique(creation_user, name)
);
A "mailbox" is what other systems would term "folders."
-- Maps messages to mailboxes (and thus to users).
create table wm_message_mailbox_map (
mailbox_id integer references wm_mailboxes,
msg_id integer references wm_messages,
seen_p char(1) default 'f' check(seen_p in ('t','f')),
answered_p char(1) default 'f' check(answered_p in ('t','f')),
flagged_p char(1) default 'f' check(flagged_p in ('t','f')),
deleted_p char(1) default 'f' check(deleted_p in ('t','f')),
draft_p char(1) default 'f' check(draft_p in ('t','f')),
recent_p char(1) default 't' check(recent_p in ('t','f')),
primary key (msg_id, mailbox_id)
);
Messages "belong" to mailboxes, which "belong" to users. This table
exists to map messages to mailboxes and store subsidiary status
information for the messages. Why aren't these just fields in the
wm_messages
table? Originally, there was some notion that
identical messages to multiple recipients could be folded together to
save on storage. This may happen at a later date, but raises reference
counting issues.
-- Parsed recipients for a message; enables search by recipient.
create table wm_recipients (
msg_id integer not null references wm_messages,
header varchar(100) not null, -- to, cc, etc.
email varchar(300) not null,
name varchar(200)
);
create index wm_recipients_by_msg_id on wm_recipients(msg_id);
This table contains parsed recipient address information for each message.
-- Headers for a message.
create table wm_headers (
msg_id integer not null references wm_messages,
-- field name as specified in the email
name varchar(100) not null,
-- lowercase version for case insensitive searches
lower_name varchar(100) not null,
value varchar(4000),
-- various parsed versions of the value
time_value date, -- date/time fields
-- email and name, for singleton address fields like From
email_value varchar(300),
name_value varchar(200),
-- original order of headers
sort_order integer not null
);
create index wm_headers_by_msg_id_name on wm_headers (msg_id, lower_name);
Headers are stored separately from the message to aid in
searching. Where applicable, header values are parsed and stored in
typed fields. The original ordering of the headers is maintained, both
so that we can recreate the header block and because order is
significant for certain fields.
-- Table for recording messages that we failed to parse for whatever reason.
create table wm_parse_errors (
filename varchar(255) primary key not null, -- message queue file
error_message varchar(4000),
first_parse_attempt date default sysdate not null
);
If an error occurs while attempting to parse a message, we store a
record of the error in this log for the administrator to review. Only
the first occurrence of an error is logged for any file, to prevent
hundreds of identical error messages from clogging the log.
-- Used for storing attachments for outgoing messages.
-- Should be cleaned out periodically.
create sequence wm_outgoing_msg_id_sequence;
create table wm_outgoing_messages (
outgoing_msg_id integer not null primary key,
body clob,
composed_message clob,
creation_date date default sysdate not null,
creation_user not null references users
);
create table wm_outgoing_headers (
outgoing_msg_id integer not null references wm_outgoing_messages on delete cascade,
name varchar(100) not null,
value varchar(4000),
sort_order integer not null
);
create unique index wm_outgoing_headers_idx on wm_outgoing_headers (outgoing_msg_id, name);
create sequence wm_outgoing_parts_sequence;
create table wm_outgoing_message_parts (
outgoing_msg_id integer not null references wm_outgoing_messages on delete cascade,
data blob,
filename varchar(600) not null,
content_type varchar(100), -- mime type of data
sort_order integer not null,
primary key (outgoing_msg_id, sort_order)
);
-- Create a job to clean up orphaned outgoing messages every day.
create or replace procedure wm_cleanup_outgoing_msgs as
begin
delete from wm_outgoing_messages
where creation_date < sysdate - 1;
end;
/
declare
job number;
begin
dbms_job.submit(job, 'wm_cleanup_outgoing_msgs;',
interval => 'sysdate + 1');
end;
/
When composing messages for sending, the unsent message and any
attachments are stored in the database. When the message is sent, a
MIME message is composed consisting of the text of the message
followed by any attachments (there is currently no facility to
intersperse attachments with text), and the data in these tables
deleted. If a user begins to compose a message, then neglects to send
it, the unsent message would remain in the database indefinitely. We
could handle this either by presenting a list of unsent messages to
the user and presenting the option of completing or deleting the
message, or by periodically cleaning out any old unsent
messages. Webmail does the latter.
Unsent outgoing attachments could as well be stored in the filesystem,
but it is easier to manage them when they are all contained within the
database.
-- PL/SQL bindings for Java procedures
create or replace procedure wm_process_queue (queuedir IN VARCHAR)
as language java
name 'com.arsdigita.mail.MessageParser.processQueue(java.lang.String)';
/
create or replace procedure wm_compose_message (outgoing_msg_id IN NUMBER)
as language java
name 'com.arsdigita.mail.MessageComposer.composeMimeMessage(int)';
/
-- Parse the queue every minute. Queue directory is hardcoded.
declare
job number;
begin
dbms_job.submit(job, 'wm_process_queue(''/home/nsadmin/qmail/queue/new'');',
interval => 'sysdate + 1/24/60');
end;
/
These PL/SQL bindings for Java procedures are the heart of the
system.
wm_process_queue
attempts to parse every file in
the given directory as an email message, deliver it to a webmail user,
and delete the file. It is run every minute by an Oracle job.
The argument to wm_process_queue
here is one of the two
hardcoded paths in the webmail module. It must be changed if the
webmail user's maildir is created in a different location than that
shown.
-- Trigger to delete subsidiary rows when a message is deleted.
create or replace trigger wm_messages_delete_trigger
before delete on wm_messages
for each row
begin
delete from wm_headers where msg_id = :old.msg_id;
delete from wm_recipients where msg_id = :old.msg_id;
delete from wm_message_mailbox_map where msg_id = :old.msg_id;
delete from wm_attachments where msg_id = :old.msg_id;
end;
/
This trigger makes deleting messages easy; deleting from
wm_messages
will also delete the appropriate rows from
any subsidiary tables.
-- interMedia index on body of message
create index wm_ctx_index on wm_messages (body)
indextype is ctxsys.context parameters ('memory 250M');
-- INSO filtered interMedia index for attachments.
create index wm_att_ctx_index on wm_attachments (data)
indextype is ctxsys.context parameters ('memory 250M filter ctxsys.inso_filter format column format');
-- Trigger to update format column for INSO index.
create or replace trigger wm_att_format_tr before insert on wm_attachments
for each row
declare
content_type varchar(100);
begin
content_type := lower(:new.content_type);
if content_type like 'text/%' or content_type like 'application/msword%' then
:new.format := 'text';
else
:new.format := 'binary';
end if;
end;
/
-- Resync the interMedia index every hour.
declare
job number;
begin
dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_ctx_index'');',
interval => 'sysdate + 1/24');
dbms_job.submit(job, 'ctx_ddl.sync_index(''wm_att_ctx_index'');',
interval => 'sysdate + 1/24');
end;
/
These indices and triggers enable full-text searches over messages. An
INSO filtered index is also created to allow full-text searches over
any attachments which contain text, including formatted documents.
VII. Legal Transactions
/admin/webmail/
The following legal transactions can occur from the events administration pages located under /admin/webmail/:
domains
--Domains may be created and deleted.
accounts
--Email accounts may be created or deleted.
/webmail/
The following legal transactions can occur from the events administration pages located under /webmail/:
messages
--Messages may be viewed, re-filed, or deleted.
composing messages
--New messages may be composed.
--Attachments may be added.
VIII. API
PL/SQL Procedures:
wm_process_queue (queuedir IN VARCHAR)
Processes the mail queue directory and inserts messages into the database
(scheduled to run every minute by default)
wm_compose_message (outgoing_msg_id IN NUMBER)
Given an outgoing_msg_id, updates the wm_outgoing_messages table
and sets the composed_message column to a complete message (including mail headers) which is ready to send.
proc wm_link_author { address }
Adds an author filter.
proc wm_link_subject { subject }
Adds a subject filter.
proc wm_header_display {msg_id header_display_style user_id}
Procedure for displaying headers.
proc wm_quote_message { author msg_text }
Quote text with "> " at the start of each line (for replying to messages).
proc wm_check_permissions { msg_id user_id }
Checks if user may view message.
proc wm_get_mime_part { conn context }
Gets the data from an attachment.
IX. User Interface
The user interface for webmail includes:
An interface for the user:
- Browsing available messages
- Reading specific messages
- Composing outgoing messages
- Customizing the user interface and functionality of the module
An interface for the administrator:
- Choosing domains handled by Webmail
- Adding and deleting users in those domains
- Viewing a list of recent Webmail errors
X. Configuration/Parameters
Jin Choi has written an excellent document on installation and configuration of the
Webmail module.
This covers configuring qmail, loading the data-model, the java files, and testing and
configuring the system.
XI. Future Improvements/Areas of Likely Change
Future improvements include the ability to block senders, save drafts, save copies of sent messages,
a more streamlined user-interface, custom signature files, and an IMAP and POP3 interface.
XII. Authors
Erik Bielefeldt