For the very impatient
[ Apache + FastCGI ] +
[ FastCgiDevKit + TCL + PostgreSQL TCL extensions ] +
[ PostgreSQL ]
Long, Rambling Intro
Let me and explain why I'm interested in
database-backed webservers.
I'm a cofounder of Kuentos Communications,
Inc., an Internet
service provider on Guam (a U.S. territory in the North Pacific,
between Sydney and Tokyo). I have the bounty of being the main
(mostly only) programmer for the company. I have created a
customer database and billing system in Perl (with data in flat
files and DB files), that is partially connected to web pages.
It works most of the time. We rely on it for our financial
survival (without it, we could not bill our customers). The
irony is that I wrote the thing, but I'm afraid to touch it. Of
course I do occasionally add little bits here and there, but
I've broken things and caused grief for our staff and thousands
of customers too often. It's brittle.
So, being the modern, forward thinking individual that I am,
I've been trying to get my partners interested in using a real
(i.e., SQL) database (like I learned how to use back in '85), so
that the whole process of improving our services to our
customers could move forward a little faster, and with less
stress. Well, whenever I actually started getting an idea how
much a real database server would
cost,
and talked with my
partners about it, it was obviously too much.
There are some conditions that restrict our options. We like
Unix. We want the server on Unix, so that we can properly
administer it. We don't like expensive hardware, so we use
Intel boxes with BSDI Unix
for our main servers. (We went with
BSDI
because we wanted to have a phone number to call when
something was beyond us. We made less than 1 phone call per
year to them and our contract has lapsed.)
So, from time to time I would wander out to the net and download
a free database server, try to install it and get something to
happen. I did this on BSDI,
and also was playing with FreeBSD
for my personal web server. From FreeBSD
I got onto PostgreSQL.
I downloaded and installed it and it generally worked.
(Actually, I had trouble running it on my 386 FreeBSD
box. I
believe because of the lack of FPU. The float tests all
bombed.)
The Point
Well, to get on with the point of this article. Here are some
important things that I learned, or pretty much knew, but had
strongly reinforced by reading Philip's book
on Web publishing:
- Having a real relational (SQL) database is good.
- Having source code is good.
- fork()ing is bad.
- Tcl can be useful.
I knew (1) from having used one before, and from having not
used one when I should have. (2) is obvious. (4) I sort of
knew, because my friend Laurence Lundblade
had used it for a
project
of his. But, I had never actually seen much of it.
Apache seemed like the web server to use.
I can get source
code. I had used it before. It is FREE.
I looked through the www.apache.org
and www.apacheweek.com sites
looking for info about scripting languages that could be bundled
into the webserver, because "fork()ing is bad". On
www.apacheweek.com
I found an article on Dynamic Page Languages.
I almost went after embedded Perl,
but being sick of Perl, and
having learned that "Tcl can be useful", I went after
NeoWebScript
to www.neosoft.com.
I downloaded it and fought with it for a
while.
In playing with NeoWebScript and Tcl, I learned that there were multiple
ways to connect to PostgreSQL. With NeoWebScript, PostgreSQL
connectivity can be built right in! Oh, joy! I thought everything was
solved. I did get the NeoWebScript web server to build, but I couldn't
quite get the PostgreSQL stuff happening. I did lose steam when I
realized that even if I did get the thing working right, I would still
have to open new connections to the PostgreSQL server whenever a page
that needed the database access was served. PostgreSQL, being like most
Unix servers, forks a process to handle each new connection. Ack!
NeoWebScript was going to solve the CGI forking problem, but I was still
left with forking a database server.
I don't remember where, or how I found it, but I had read about
FastCGI.
It must have been on www.apacheweek.com when I was
checking out scripting stuff. It seemed interesting. The main
point being that a FastCGI
program is just like a CGI,
except it
loops waiting for a request. When a request comes in, it
handles it, and then resumes waiting. It doesn't fork() for
every request.
In the time that I was messing with NeoWebScript
and Tcl, I learned that Tcl is supposed to be easy to put together with other
things and build a new application. (That's it's whole
purpose, really.) I had also found out that PostgreSQL
is distributed
with code to build "pgtclsh", an extended Tcl
shell that can
talk to a PostgreSQL database server. I figured, heck, let me
try to put the FastCGI functions, plus
the PostgreSQL functions
into a single Tcl
shell. Of course, I would have to build a
FastCGI
capable web server. I did that with Apache 1.2.4 and the Apache module from FastCGI.
The end model is this
Apache + FastCGI <==> FastCGI + Tcl + pgTcl <==> PostgreSQL
There are three processes running (simplest case). (1)
The web server (as usual).
(2) The FastCGI Tcl script, which is pre-launched by the web
server. It sits and waits for requests from the server. (3)
The PostgreSQL server. In the Tcl script, before we begin waiting
for a request from the web server, we connect to the database.
This way the database server process is already open and ready
to go when a request arrives, and it stays open for multiple
requests.
Was this difficult to build? Well, it's a little tricky. But, it
was much easier than getting NeoWebScript (which was nearly
completely pre-packaged, and "ready-to-go") running with the
database module happy. I did not manage to put everything
together very neatly, but it's all there (plus some).
[I suspect that there are probably people who have downloaded
NeoWebScript and compiled and run with it with no problems. I
may have just missed something simple. I would love to hear
from you if you have a better history with NeoWebScript.]
I also had a picky complaint with NeoWebScript. I don't like
server-side parsed HTML. Having HTML
which contains a program, that outputs HTML is more loopy than I
like. Also, I don't like the idea of every (I know I can
configure it otherwise) HTML file on my server getting parsed on
it's way out the door. I prefer to have a complete program that
simply puts out the entire HTML. That's just the way I am.
My Recipe
Apache + mod_fastcgi.c
This was nearly trivial. With Apache, you unpack it, cd to the src
directory, do ./Configure, and (g)make. To add the
FastCGI, edit the "Configuration" file to include this line:
Module fastcgi_module mod_fastcgi.o
I put it next to the "mod_cgi.o" line in mine.
Then, do the ./Configure again and re-make. That's about
it for building the httpd server. (Of course, make sure you've fetched the
mod_fastcgi.c file and put it in the src directory.)
Building FastCGI + Tcl
The FastCGI stuff works by replacing the stdio package for
whatever platform you're building it into. This works for C, Perl, Tcl,
whatever (mostly). I ended up using Tcl 7.4p3 (don't forget the patches)
because the other version I tried (7.6) was setup to be compilable for
other (non-Unix) platforms. I think the I/O stuff in the higher version of Tcl
was too complicated for the FastCGI to override easily.
The key document for putting FastCGI into Tcl is "fcgi-devel-kit/doc/fcgi-tcl.htm".
There are two main sections: "2. Recipe" and "3. Recipe Explained". Section 2 is
supposed to be easier, but I found that using 3 was easier. (Especially when I got
around to pulling in the PostgreSQL functions.)
After you build the tclsh with the FastCGI stuff included, you should be able to do
something like this:
./tclsh
% info commands FCGI_Accept
FCGI_Accept
% exit
If it doesn't come back with "FCGI_Accept", then it either didn't get put together right,
or you're running the wrong tclsh (make sure to include "./").
Building FastCGI + Tcl + PostgreSQL
The PostgreSQL distribution includes stuff to build a "pgtclsh" (PostgreSQL Tcl shell).
The included docs say that it's meant for Tcl 7.4 as well, but there were a few functions
that I had to comment out. I believe that these were functions introduced by newer
versions of Tcl (conflicting the docs) and that I did not harm in commenting them out.
But I'm not totally sure. (more on this below)
In Makefile.in I ended up with these:
# Additional libraries to use when linking. The "LIBS" part will be
# replaced (or has already been replaced) with relevant libraries as
# determined by the configure script.
LIBS = /home/postgres/tcl/fcgi-devel-kit/libfcgi/libfcgi.a \
/home/postgres/lib/libpq.a \
@LIBS@
# To change the compiler switches, for example to change from -O
# to -g, change the following line:
CFLAGS = -O -I/home/postgres/tcl/fcgi-devel-kit/include \
-include /home/postgres/tcl/fcgi-devel-kit/include/fcgi_stdio.h \
-I/home/postgres/postgresql-v6.1.1/src/include \
-I/home/postgres/postgresql-v6.1.1/src/libpq \
-DDLLAPI=
GENERIC_OBJS = regexp.o tclAsync.o tclBasic.o tclCkalloc.o \
tclCmdAH.o tclCmdIL.o tclCmdMZ.o tclExpr.o tclGet.o \
tclHash.o tclHistory.o tclLink.o tclParse.o tclProc.o \
tclUtil.o tclVar.o tclFCGI.o pgtcl.o pgtclCmds.o pgtclId.o \
neo_stuff.o
SRCS= regexp.c tclAsync.c tclBasic.c tclCkalloc.c \
tclCmdAH.c tclCmdIL.c tclCmdMZ.c tclExpr.c tclGet.c \
tclHash.c tclHistory.c tclLink.c tclParse.c tclProc.c \
tclUtil.c tclVar.c panic.c tclEnv.c tclGlob.c tclMain.c \
tclMtherr.c tclUnixAZ.c tclUnixStr.c tclUnixUtil.c \
tclTest.c tclAppInit.c tclFCGI.c pgtcl.c pgtclCmds.c pgtclId.c \
neo_stuff.c
You'll have to change the paths there, obviously. The "DLLAPI" I
didn't quite figure out what was supposed to happen with it, but
defining it as nothing made the compiler much happier. The neo_stuff.*
is some stuff I stole from the NeoSoft people (and hacked around a little bit).
Here's a link to neo_stuff.c.
The other pg*.[ch] stuff I copied over to my Tcl directory from the PostgreSQL
distribution.
In tclAppInit.c I inserted
if (Pgtcl_Init(interp) == TCL_ERROR) {
return TCL_ERROR;
}
In pgtcl.c I commented out the stuff regarding
"Tcl_(Create|Delete)ExitHandler" and "Tcl_PkgProvide".
I think that these are calls for a features in
later editions of Tcl. (If I've done evil, please me know.)
To add the commands that I stole from NeoSoft, I added these lines:
Tcl_CreateCommand(interp, "html",
Tcl_HtmlCmd,
(ClientData) 0, (Tcl_CmdDeleteProc *) NULL);
Tcl_CreateCommand (interp, "unescape_string",
Neo_UnescapeStringCmd,
(ClientData) NULL, (void (*)()) NULL);
Tcl_CreateCommand (interp, "escape_string",
Neo_EscapeStringCmd,
(ClientData) NULL, (void (*)()) NULL);
I've read that one of the major complaints against using Tcl
for CGI is that regular expressions are handled more efficiently
in Perl. Well, I stole the "Neo_UnescapeStringCmd" in order to
more efficiently handle incoming CGI data. I know this doesn't completely
answer that complaint, but I'm happier with this.
Then, in Bash,
export CC=gcc
./Configure
gmake
Which gives a new tclsh.
./tclsh
% info commands pg_connect
pg_connect
% exit
Making a Fast Connection
Well, now that we have all the pieces, we can configure the web server, and
write a little FastCGI program that connects to the database.
I put these two lines in srm.conf
FastCgiIpcDir /home/apache/fastsockets/
AddHandler fastcgi-script .fcgi
Then, something like this in either httpd.conf, or srm.conf
AppClass /home/apache/docs/payroll/list_employees.fcgi
The AppClass is necessary to pre-launch the FastCGI. There
are more arguments to do things like have 5 processes waiting, etc.
The actual script looks something like this:
#!/usr/local/bin/fpgtcl
set db [pg_connect payroll]
while {[FCGI_Accept] >= 0} {
puts -nonewline "Content-type: text/html\r\n\r\n"
puts "<h2>List Employees</h2>"
puts "<table border=1>"
pg_select $db {
select emp_no, emp_name
from emp
order by emp_no
} got_back {
puts "<tr><td>$got_back(emp_no)</td><td>$got_back(emp_name)</td></tr>"
}
puts "</table>"
}
pg_disconnect $db
The important things I was after are exemplified here. The loop with the FCGI_Accept
is where the processing for each CGI request happens. Before the beginning of this, the
connection to the database is opened with pg_connect. The connection stays open for
the life of the script (many, many requests).
I did some rough tests. I compared this script with the FastCGI turned on and off.
With FastCGI, it was about 10 times faster. This was just hitting the server locally,
requesting the page, reading the whole page, and then repeating. The database I was hitting
15 records in it. But the point is that the forking overhead is gone.
In another test, I set up the server with 5 of these FastCGI processes in the queue. I then
hit the server from 3 neighbor unix machines with 3 processes each. Potentially I could have had 9
simultaneous connections. I did 4500 hits in 2:28, or about 30 hits a second. I did get
about 50 requests that didn't get the proper data back, but I'm sure I could just increase my
pool of FastCGI's.
Remaining problem
I haven't had much time with PostreSQL, so I don't know how good it is.
It seems to have a lot of connections, so somebody is using it (or
promoting pretty well). I'd like to hear from people who're using it
(either in a web context or not) and what they think of it. I don't
know how well it will scale. It works great with tiny datasets.
Another thing that's missing here is full text indexing. There are a
lot of hooks in PostgreSQL for adding features, and it should be
possible to add a new indexing scheme for full text (maybe I have some
code lying around here, somewhere).
Greenspun Chimes In
Editor: I can't resist commenting a bit on Patrick's article. My main
caveat is that if you're not running your RDBMS in autocommit mode, you
have to be careful about reusing database connections. For example, if
one request issues a bunch of DML statements but then neither commits
nor rolls back, presumably because there was some kind of error, you
want to make sure that the next user of the database connection does not
issue some more DML statements and then COMMIT the whole bunch. I
sweated over this a bit when we wrote an Oracle driver for AOLserver. I
think Patrick is probably OK because an error would result in his
FastCGI process dying and getting restarted. When an RDBMS client (CGI
script) dies, the database will rollback the work.
If you have $1000 in cash to spare, less time and Unix expertise than
Patrick, and don't need source code, I personally think you'll probably
be better off with Solid + AOLserver. Solid is normally $2000 but if
you tell them you read about it in the
CD-ROM for my book then they will let you have it for $1000.
AOLserver is free. My reasons for preferring Solid are that I think it
is more carefully written than any of the freeware SQL programs. My
reasons for preferring AOLserver are that it has a bunch of useful API
calls such as "grab me a Web page from that other server" or "send
email".