A fast cheap database-backed Web server
by Patrick Kelly (pkelly@kuentos.guam.net)
Submitted on: 1997-10-03
Last updated: 1997-10-03
ArsDigita : ArsDigita Systems Journal : One article
For the very impatient
[ Apache + FastCGI ] +
[ FastCgiDevKit + TCL + PostgreSQL TCL extensions ] +
[ PostgreSQL ]
Long, Rambling Intro
Let me and explain why I'm interested in
database-backed webservers.
I'm a cofounder of Kuentos Communications,
Inc., an Internet
service provider on Guam (a U.S. territory in the North Pacific,
between Sydney and Tokyo). I have the bounty of being the main
(mostly only) programmer for the company. I have created a
customer database and billing system in Perl (with data in flat
files and DB files), that is partially connected to web pages.
It works most of the time. We rely on it for our financial
survival (without it, we could not bill our customers). The
irony is that I wrote the thing, but I'm afraid to touch it. Of
course I do occasionally add little bits here and there, but
I've broken things and caused grief for our staff and thousands
of customers too often. It's brittle.
So, being the modern, forward thinking individual that I am,
I've been trying to get my partners interested in using a real
(i.e., SQL) database (like I learned how to use back in '85), so
that the whole process of improving our services to our
customers could move forward a little faster, and with less
stress. Well, whenever I actually started getting an idea how
much a real database server would
cost,
and talked with my
partners about it, it was obviously too much.
There are some conditions that restrict our options. We like
Unix. We want the server on Unix, so that we can properly
administer it. We don't like expensive hardware, so we use
Intel boxes with BSDI Unix
for our main servers. (We went with
BSDI
because we wanted to have a phone number to call when
something was beyond us. We made less than 1 phone call per
year to them and our contract has lapsed.)
So, from time to time I would wander out to the net and download
a free database server, try to install it and get something to
happen. I did this on BSDI,
and also was playing with FreeBSD
for my personal web server. From FreeBSD
I got onto PostgreSQL.
I downloaded and installed it and it generally worked.
(Actually, I had trouble running it on my 386 FreeBSD
box. I
believe because of the lack of FPU. The float tests all
bombed.)
The Point
Well, to get on with the point of this article. Here are some
important things that I learned, or pretty much knew, but had
strongly reinforced by reading Philip's book
on Web publishing:
- Having a real relational (SQL) database is good.
- Having source code is good.
- fork()ing is bad.
- Tcl can be useful.
I knew (1) from having used one before, and from having not
used one when I should have. (2) is obvious. (4) I sort of
knew, because my friend Laurence Lundblade
had used it for a
project
of his. But, I had never actually seen much of it.
Apache seemed like the web server to use.
I can get source
code. I had used it before. It is FREE.
I looked through the www.apache.org
and www.apacheweek.com sites
looking for info about scripting languages that could be bundled
into the webserver, because "fork()ing is bad". On
www.apacheweek.com
I found an article on Dynamic Page Languages.
I almost went after embedded Perl,
but being sick of Perl, and
having learned that "Tcl can be useful", I went after
NeoWebScript
to www.neosoft.com.
I downloaded it and fought with it for a
while.
In playing with NeoWebScript and Tcl, I learned that there were multiple
ways to connect to PostgreSQL. With NeoWebScript, PostgreSQL
connectivity can be built right in! Oh, joy! I thought everything was
solved. I did get the NeoWebScript web server to build, but I couldn't
quite get the PostgreSQL stuff happening. I did lose steam when I
realized that even if I did get the thing working right, I would still
have to open new connections to the PostgreSQL server whenever a page
that needed the database access was served. PostgreSQL, being like most
Unix servers, forks a process to handle each new connection. Ack!
NeoWebScript was going to solve the CGI forking problem, but I was still
left with forking a database server.
I don't remember where, or how I found it, but I had read about
FastCGI.
It must have been on www.apacheweek.com when I was
checking out scripting stuff. It seemed interesting. The main
point being that a FastCGI
program is just like a CGI,
except it
loops waiting for a request. When a request comes in, it
handles it, and then resumes waiting. It doesn't fork() for
every request.
In the time that I was messing with NeoWebScript
and Tcl, I learned that Tcl is supposed to be easy to put together with other
things and build a new application. (That's it's whole
purpose, really.) I had also found out that PostgreSQL
is distributed
with code to build "pgtclsh", an extended Tcl
shell that can
talk to a PostgreSQL database server. I figured, heck, let me
try to put the FastCGI functions, plus
the PostgreSQL functions
into a single Tcl
shell. Of course, I would have to build a
FastCGI
capable web server. I did that with Apache 1.2.4 and the Apache module from FastCGI.
The end model is this
Apache + FastCGI <==> FastCGI + Tcl + pgTcl <==> PostgreSQL
There are three processes running (simplest case). (1)
The web server (as usual).
(2) The FastCGI Tcl script, which is pre-launched by the web
server. It sits and waits for requests from the server. (3)
The PostgreSQL server. In the Tcl script, before we begin waiting
for a request from the web server, we connect to the database.
This way the database server process is already open and ready
to go when a request arrives, and it stays open for multiple
requests.
Was this difficult to build? Well, it's a little tricky. But, it
was much easier than getting NeoWebScript (which was nearly
completely pre-packaged, and "ready-to-go") running with the
database module happy. I did not manage to put everything
together very neatly, but it's all there (plus some).
[I suspect that there are probably people who have downloaded
NeoWebScript and compiled and run with it with no problems. I
may have just missed something simple. I would love to hear
from you if you have a better history with NeoWebScript.]
I also had a picky complaint with NeoWebScript. I don't like
server-side parsed HTML. Having HTML
which contains a program, that outputs HTML is more loopy than I
like. Also, I don't like the idea of every (I know I can
configure it otherwise) HTML file on my server getting parsed on
it's way out the door. I prefer to have a complete program that
simply puts out the entire HTML. That's just the way I am.
My Recipe
Apache + mod_fastcgi.c
This was nearly trivial. With Apache, you unpack it, cd to the src
directory, do ./Configure, and (g)make. To add the
FastCGI, edit the "Configuration" file to include this line:
Module fastcgi_module mod_fastcgi.o
I put it next to the "mod_cgi.o" line in mine.
Then, do the ./Configure again and re-make. That's about
it for building the httpd server. (Of course, make sure you've fetched the
mod_fastcgi.c file and put it in the src directory.)
Building FastCGI + Tcl
The FastCGI stuff works by replacing the stdio package for
whatever platform you're building it into. This works for C, Perl, Tcl,
whatever (mostly). I ended up using Tcl 7.4p3 (don't forget the patches)
because the other version I tried (7.6) was setup to be compilable for
other (non-Unix) platforms. I think the I/O stuff in the higher version of Tcl
was too complicated for the FastCGI to override easily.
The key document for putting FastCGI into Tcl is "fcgi-devel-kit/doc/fcgi-tcl.htm".
There are two main sections: "2. Recipe" and "3. Recipe Explained". Section 2 is
supposed to be easier, but I found that using 3 was easier. (Especially when I got
around to pulling in the PostgreSQL functions.)
After you build the tclsh with the FastCGI stuff included, you should be able to do
something like this:
./tclsh
% info commands FCGI_Accept
FCGI_Accept
% exit
If it doesn't come back with "FCGI_Accept", then it either didn't get put together right,
or you're running the wrong tclsh (make sure to include "./").
Building FastCGI + Tcl + PostgreSQL
The PostgreSQL distribution includes stuff to build a "pgtclsh" (PostgreSQL Tcl shell).
The included docs say that it's meant for Tcl 7.4 as well, but there were a few functions
that I had to comment out. I believe that these were functions introduced by newer
versions of Tcl (conflicting the docs) and that I did not harm in commenting them out.
But I'm not totally sure. (more on this below)
In Makefile.in I ended up with these:
# Additional libraries to use when linking. The "LIBS" part will be
# replaced (or has already been replaced) with relevant libraries as
# determined by the configure script.
LIBS = /home/postgres/tcl/fcgi-devel-kit/libfcgi/libfcgi.a \
/home/postgres/lib/libpq.a \
@LIBS@
# To change the compiler switches, for example to change from -O
# to -g, change the following line:
CFLAGS = -O -I/home/postgres/tcl/fcgi-devel-kit/include \
-include /home/postgres/tcl/fcgi-devel-kit/include/fcgi_stdio.h \
-I/home/postgres/postgresql-v6.1.1/src/include \
-I/home/postgres/postgresql-v6.1.1/src/libpq \
-DDLLAPI=
GENERIC_OBJS = regexp.o tclAsync.o tclBasic.o tclCkalloc.o \
tclCmdAH.o tclCmdIL.o tclCmdMZ.o tclExpr.o tclGet.o \
tclHash.o tclHistory.o tclLink.o tclParse.o tclProc.o \
tclUtil.o tclVar.o tclFCGI.o pgtcl.o pgtclCmds.o pgtclId.o \
neo_stuff.o
SRCS= regexp.c tclAsync.c tclBasic.c tclCkalloc.c \
tclCmdAH.c tclCmdIL.c tclCmdMZ.c tclExpr.c tclGet.c \
tclHash.c tclHistory.c tclLink.c tclParse.c tclProc.c \
tclUtil.c tclVar.c panic.c tclEnv.c tclGlob.c tclMain.c \
tclMtherr.c tclUnixAZ.c tclUnixStr.c tclUnixUtil.c \
tclTest.c tclAppInit.c tclFCGI.c pgtcl.c pgtclCmds.c pgtclId.c \
neo_stuff.c
You'll have to change the paths there, obviously. The "DLLAPI" I
didn't quite figure out what was supposed to happen with it, but
defining it as nothing made the compiler much happier. The neo_stuff.*
is some stuff I stole from the NeoSoft people (and hacked around a little bit).
Here's a link to neo_stuff.c.
The other pg*.[ch] stuff I copied over to my Tcl directory from the PostgreSQL
distribution.
In tclAppInit.c I inserted
if (Pgtcl_Init(interp) == TCL_ERROR) {
return TCL_ERROR;
}
In pgtcl.c I commented out the stuff regarding
"Tcl_(Create|Delete)ExitHandler" and "Tcl_PkgProvide".
I think that these are calls for a features in
later editions of Tcl. (If I've done evil, please me know.)
To add the commands that I stole from NeoSoft, I added these lines:
Tcl_CreateCommand(interp, "html",
Tcl_HtmlCmd,
(ClientData) 0, (Tcl_CmdDeleteProc *) NULL);
Tcl_CreateCommand (interp, "unescape_string",
Neo_UnescapeStringCmd,
(ClientData) NULL, (void (*)()) NULL);
Tcl_CreateCommand (interp, "escape_string",
Neo_EscapeStringCmd,
(ClientData) NULL, (void (*)()) NULL);
I've read that one of the major complaints against using Tcl
for CGI is that regular expressions are handled more efficiently
in Perl. Well, I stole the "Neo_UnescapeStringCmd" in order to
more efficiently handle incoming CGI data. I know this doesn't completely
answer that complaint, but I'm happier with this.
Then, in Bash,
export CC=gcc
./Configure
gmake
Which gives a new tclsh.
./tclsh
% info commands pg_connect
pg_connect
% exit
Making a Fast Connection
Well, now that we have all the pieces, we can configure the web server, and
write a little FastCGI program that connects to the database.
I put these two lines in srm.conf
FastCgiIpcDir /home/apache/fastsockets/
AddHandler fastcgi-script .fcgi
Then, something like this in either httpd.conf, or srm.conf
AppClass /home/apache/docs/payroll/list_employees.fcgi
The AppClass is necessary to pre-launch the FastCGI. There
are more arguments to do things like have 5 processes waiting, etc.
The actual script looks something like this:
#!/usr/local/bin/fpgtcl
set db [pg_connect payroll]
while {[FCGI_Accept] >= 0} {
puts -nonewline "Content-type: text/html\r\n\r\n"
puts "<h2>List Employees</h2>"
puts "<table border=1>"
pg_select $db {
select emp_no, emp_name
from emp
order by emp_no
} got_back {
puts "<tr><td>$got_back(emp_no)</td><td>$got_back(emp_name)</td></tr>"
}
puts "</table>"
}
pg_disconnect $db
The important things I was after are exemplified here. The loop with the FCGI_Accept
is where the processing for each CGI request happens. Before the beginning of this, the
connection to the database is opened with pg_connect. The connection stays open for
the life of the script (many, many requests).
I did some rough tests. I compared this script with the FastCGI turned on and off.
With FastCGI, it was about 10 times faster. This was just hitting the server locally,
requesting the page, reading the whole page, and then repeating. The database I was hitting
15 records in it. But the point is that the forking overhead is gone.
In another test, I set up the server with 5 of these FastCGI processes in the queue. I then
hit the server from 3 neighbor unix machines with 3 processes each. Potentially I could have had 9
simultaneous connections. I did 4500 hits in 2:28, or about 30 hits a second. I did get
about 50 requests that didn't get the proper data back, but I'm sure I could just increase my
pool of FastCGI's.
Remaining problem
I haven't had much time with PostreSQL, so I don't know how good it is.
It seems to have a lot of connections, so somebody is using it (or
promoting pretty well). I'd like to hear from people who're using it
(either in a web context or not) and what they think of it. I don't
know how well it will scale. It works great with tiny datasets.
Another thing that's missing here is full text indexing. There are a
lot of hooks in PostgreSQL for adding features, and it should be
possible to add a new indexing scheme for full text (maybe I have some
code lying around here, somewhere).
Greenspun Chimes In
Editor: I can't resist commenting a bit on Patrick's article. My main
caveat is that if you're not running your RDBMS in autocommit mode, you
have to be careful about reusing database connections. For example, if
one request issues a bunch of DML statements but then neither commits
nor rolls back, presumably because there was some kind of error, you
want to make sure that the next user of the database connection does not
issue some more DML statements and then COMMIT the whole bunch. I
sweated over this a bit when we wrote an Oracle driver for AOLserver. I
think Patrick is probably OK because an error would result in his
FastCGI process dying and getting restarted. When an RDBMS client (CGI
script) dies, the database will rollback the work.
If you have $1000 in cash to spare, less time and Unix expertise than
Patrick, and don't need source code, I personally think you'll probably
be better off with Solid + AOLserver. Solid is normally $2000 but if
you tell them you read about it in the
CD-ROM for my book then they will let you have it for $1000.
AOLserver is free. My reasons for preferring Solid are that I think it
is more carefully written than any of the freeware SQL programs. My
reasons for preferring AOLserver are that it has a bunch of useful API
calls such as "grab me a Web page from that other server" or "send
email".
asj-editors@arsdigita.com
Reader's Comments
Note:
AOLserver does not work with LinuxPPC as
of March 30, 1998. The Mac person on the
AOLserver team informed me that there are
no current plans to support AOLserver but
that he "wants to get it running on his
G3 at home." So...that lame 7200 gathering
dust at work isn't going to be a good test
machine for AOLserver.
-- Nathanial Robertson, March 30, 1998
I like Patrick's recipe, but I have a few
ingredient changes that work for me in April, 98.
Linux (RedHat 5.0)
AOLserver 2.3 (beta4)
PostgreSQL 6.3.1
AOLserver's built-in support for Postgres is
high notch work. All of Philg's examples from
Database-backed Web sites run under the
AOL/Postgres combo except for the ones requiring
the full-text indexing of Illustra/PLS. I
regularly have uptimes in the months, with a
moderatly loaded PPro 200.
For those that don't have the $500 to spend on
Solid WebEngine for Linux to use with AOLserver,
PostgreSQL (versions 6.2.1 and later) works
nicely.
-- Lamar Owen, April 20, 1998
I would like to suggest and alternate recipe. As
far as freeware tools go, You might want to look
into PHP3,
it fits the bill of a "Dynamic Page Language". Its
free, well supported, runs on most platforms, I
think there is even a Win32 version. It runs as an
Apache-Mod (so its fast) and has a lot of hooks
into Databases built in. You see it used alot with
mySQL, put works well with other SQL databases.
-- chuck freitas, November 12, 1998
Just as an FYI, AOL Server 3.0 is now open source. This, along with improvements in open source database systems, expands the possibilities for db-backed web sites.
Thanks, AOL! Thanks also to Philip, who I'm sure had more than a little influence in this decision. ;-)
-- Frank Wortner, September 1, 1999
I've done this, though not quite in the way suggested (the FastCGI thing sounds like a hack, in particular). Apache + mod_perl + DBI + DBI::Pg + postgresql works fine. That said, the project I've developed on this will be migrated to MySQL, since (roughly speaking) it's mainly SELECTs and the overhead of having a "proper" database is undesirable.
I don't know much about Tcl, other than a lasting (and irrational) dislike engendered by the ugly "make xconfig" tool in the Linux kernel source. So I can't comment on how to do things this way. But the perl interface is quite neat (you can even make it object oriented, though I would question whether you would want to do this in many case.
-- Chris Lightfoot, September 18, 1999
Also in the "I like Perl, not TCL" stakes is HTML::Mason which offers Perl in pages (and, being built on mod_perl, doesn't fork). It's also fully buzzword compliant, offering templaiting, caching, and all that other stuff in glossy brochures. Most of it even works pretty well. (Personally, I'm quite comfortable with TCL or Perl, but I enjoy working in Perl more).
And allow me to heartily endorse Postgresql. 7.0+ is very nice indeed.
-- Rodger Donaldson, August 3, 2000
This is my first attempt a posting so here goes...... I like how this thing is free, but one thing I rarely notice are any comments about MySQL. I've been using MySQL with Perl for the past year and it's been quite fast even on a single processor 450mhz Ultra10. As since I don't see much of it I was wondering if there is anything I should know about that is inherently wrong with this choice. I know many that swear by it any many that despise it, but have yet to get any hard evidence. I've just started with TCL and I'm slowly catching on to it's little ins and outs.
-- Christopher Brown, September 13, 2000
For MySQL, you could look at the opinion of the OpenACS team on why they chose Postgresql over MySQL.
The main things that bug be about MySQL beyond the lack of transactional integrity are the fact that:
- MySQL has a limited (unkind people would call it brain-dead) dialect of SQL. Without tools like subselects, programming with MySQL involves a lot of "do two SELECTs and kludge them together" nonsense.
- MySQL doesn't have any concept of referential integrity; programmers waste a large amount of time manually touching data across several tables; Postgresql allows the common SQL contstraints (like cascading foreign keys), so the database can take care of changes (just like it should).
- MySQL locking is laughable; I'm sorry, but that's the only word for it. I shouldn't have to manually lock data, and writing should not block reading. It I wanted to worry about locking strategies, I'd write my own damn database.
Recent real world-ish and benchmarketing tests have suggested that the main real benefit of MySQL - read speed - does not exist to the degree claimed.
As always, YMMV. If you like doing extra work when you don't have to and don't mind the possibility of data dissapearing with a giant sucking sound, MySQL is great.
-- Rodger Donaldson, September 27, 2000
I've been using Postgres 7 for a while now, and find it fast and stable. I really love the PHP module for Apache, because it is very fast and handles Postgres beautifully.
I know Phil G likes his AOLServer for ns_httpget and such, but I think it is a bit more sensical to separate out the application logic and let the web server just handle requests. I do see how having Tcl compiled-in makes the performance a near non-issue, but with the PHP module running inside your server process, I really think you can take your pick. 'Course, you sacrifice all that sweet Ars Digita Tcl code (maybe I'll help volunteer to write the Apache-PHP port of OpenACS)...
-- Paul Smith, February 6, 2001