ArsDigita Archives
 
 
   
 
spacer

A fast cheap database-backed Web server

by Patrick Kelly (pkelly@kuentos.guam.net)

Submitted on: 1997-10-03
Last updated: 1997-10-03

ArsDigita : ArsDigita Systems Journal : One article


For the very impatient

	[ Apache + FastCGI ] + 
	[ FastCgiDevKit + TCL + PostgreSQL TCL extensions ] +
	[ PostgreSQL ]

Long, Rambling Intro

Let me and explain why I'm interested in database-backed webservers. I'm a cofounder of Kuentos Communications, Inc., an Internet service provider on Guam (a U.S. territory in the North Pacific, between Sydney and Tokyo). I have the bounty of being the main (mostly only) programmer for the company. I have created a customer database and billing system in Perl (with data in flat files and DB files), that is partially connected to web pages. It works most of the time. We rely on it for our financial survival (without it, we could not bill our customers). The irony is that I wrote the thing, but I'm afraid to touch it. Of course I do occasionally add little bits here and there, but I've broken things and caused grief for our staff and thousands of customers too often. It's brittle.

So, being the modern, forward thinking individual that I am, I've been trying to get my partners interested in using a real (i.e., SQL) database (like I learned how to use back in '85), so that the whole process of improving our services to our customers could move forward a little faster, and with less stress. Well, whenever I actually started getting an idea how much a real database server would cost, and talked with my partners about it, it was obviously too much.

There are some conditions that restrict our options. We like Unix. We want the server on Unix, so that we can properly administer it. We don't like expensive hardware, so we use Intel boxes with BSDI Unix for our main servers. (We went with BSDI because we wanted to have a phone number to call when something was beyond us. We made less than 1 phone call per year to them and our contract has lapsed.)

So, from time to time I would wander out to the net and download a free database server, try to install it and get something to happen. I did this on BSDI, and also was playing with FreeBSD for my personal web server. From FreeBSD I got onto PostgreSQL. I downloaded and installed it and it generally worked. (Actually, I had trouble running it on my 386 FreeBSD box. I believe because of the lack of FPU. The float tests all bombed.)

The Point

Well, to get on with the point of this article. Here are some important things that I learned, or pretty much knew, but had strongly reinforced by reading Philip's book on Web publishing:

  1. Having a real relational (SQL) database is good.
  2. Having source code is good.
  3. fork()ing is bad.
  4. Tcl can be useful.

I knew (1) from having used one before, and from having not used one when I should have. (2) is obvious. (4) I sort of knew, because my friend Laurence Lundblade had used it for a project of his. But, I had never actually seen much of it.

Apache seemed like the web server to use. I can get source code. I had used it before. It is FREE.

I looked through the www.apache.org and www.apacheweek.com sites looking for info about scripting languages that could be bundled into the webserver, because "fork()ing is bad". On www.apacheweek.com I found an article on Dynamic Page Languages. I almost went after embedded Perl, but being sick of Perl, and having learned that "Tcl can be useful", I went after NeoWebScript to www.neosoft.com. I downloaded it and fought with it for a while.

In playing with NeoWebScript and Tcl, I learned that there were multiple ways to connect to PostgreSQL. With NeoWebScript, PostgreSQL connectivity can be built right in! Oh, joy! I thought everything was solved. I did get the NeoWebScript web server to build, but I couldn't quite get the PostgreSQL stuff happening. I did lose steam when I realized that even if I did get the thing working right, I would still have to open new connections to the PostgreSQL server whenever a page that needed the database access was served. PostgreSQL, being like most Unix servers, forks a process to handle each new connection. Ack!

NeoWebScript was going to solve the CGI forking problem, but I was still left with forking a database server.

I don't remember where, or how I found it, but I had read about FastCGI. It must have been on www.apacheweek.com when I was checking out scripting stuff. It seemed interesting. The main point being that a FastCGI program is just like a CGI, except it loops waiting for a request. When a request comes in, it handles it, and then resumes waiting. It doesn't fork() for every request.

In the time that I was messing with NeoWebScript and Tcl, I learned that Tcl is supposed to be easy to put together with other things and build a new application. (That's it's whole purpose, really.) I had also found out that PostgreSQL is distributed with code to build "pgtclsh", an extended Tcl shell that can talk to a PostgreSQL database server. I figured, heck, let me try to put the FastCGI functions, plus the PostgreSQL functions into a single Tcl shell. Of course, I would have to build a FastCGI capable web server. I did that with Apache 1.2.4 and the Apache module from FastCGI.

The end model is this

    Apache + FastCGI  <==> FastCGI + Tcl + pgTcl <==> PostgreSQL
There are three processes running (simplest case). (1) The web server (as usual). (2) The FastCGI Tcl script, which is pre-launched by the web server. It sits and waits for requests from the server. (3) The PostgreSQL server. In the Tcl script, before we begin waiting for a request from the web server, we connect to the database. This way the database server process is already open and ready to go when a request arrives, and it stays open for multiple requests.

Was this difficult to build? Well, it's a little tricky. But, it was much easier than getting NeoWebScript (which was nearly completely pre-packaged, and "ready-to-go") running with the database module happy. I did not manage to put everything together very neatly, but it's all there (plus some).

[I suspect that there are probably people who have downloaded NeoWebScript and compiled and run with it with no problems. I may have just missed something simple. I would love to hear from you if you have a better history with NeoWebScript.]

I also had a picky complaint with NeoWebScript. I don't like server-side parsed HTML. Having HTML which contains a program, that outputs HTML is more loopy than I like. Also, I don't like the idea of every (I know I can configure it otherwise) HTML file on my server getting parsed on it's way out the door. I prefer to have a complete program that simply puts out the entire HTML. That's just the way I am.

My Recipe

Apache + mod_fastcgi.c

This was nearly trivial. With Apache, you unpack it, cd to the src directory, do ./Configure, and (g)make. To add the FastCGI, edit the "Configuration" file to include this line:

Module fastcgi_module mod_fastcgi.o

I put it next to the "mod_cgi.o" line in mine.

Then, do the ./Configure again and re-make. That's about it for building the httpd server. (Of course, make sure you've fetched the mod_fastcgi.c file and put it in the src directory.)

Building FastCGI + Tcl

The FastCGI stuff works by replacing the stdio package for whatever platform you're building it into. This works for C, Perl, Tcl, whatever (mostly). I ended up using Tcl 7.4p3 (don't forget the patches) because the other version I tried (7.6) was setup to be compilable for other (non-Unix) platforms. I think the I/O stuff in the higher version of Tcl was too complicated for the FastCGI to override easily.

The key document for putting FastCGI into Tcl is "fcgi-devel-kit/doc/fcgi-tcl.htm". There are two main sections: "2. Recipe" and "3. Recipe Explained". Section 2 is supposed to be easier, but I found that using 3 was easier. (Especially when I got around to pulling in the PostgreSQL functions.)

After you build the tclsh with the FastCGI stuff included, you should be able to do something like this:

   ./tclsh
   % info commands FCGI_Accept
   FCGI_Accept
   % exit
If it doesn't come back with "FCGI_Accept", then it either didn't get put together right, or you're running the wrong tclsh (make sure to include "./").

Building FastCGI + Tcl + PostgreSQL

The PostgreSQL distribution includes stuff to build a "pgtclsh" (PostgreSQL Tcl shell). The included docs say that it's meant for Tcl 7.4 as well, but there were a few functions that I had to comment out. I believe that these were functions introduced by newer versions of Tcl (conflicting the docs) and that I did not harm in commenting them out. But I'm not totally sure. (more on this below)

In Makefile.in I ended up with these:

   # Additional libraries to use when linking.  The "LIBS" part will be
   # replaced (or has already been replaced) with relevant libraries as
   # determined by the configure script.
   LIBS = /home/postgres/tcl/fcgi-devel-kit/libfcgi/libfcgi.a \
           /home/postgres/lib/libpq.a \
           @LIBS@

   # To change the compiler switches, for example to change from -O
   # to -g, change the following line:
   CFLAGS = -O -I/home/postgres/tcl/fcgi-devel-kit/include \
           -include /home/postgres/tcl/fcgi-devel-kit/include/fcgi_stdio.h \
           -I/home/postgres/postgresql-v6.1.1/src/include \
           -I/home/postgres/postgresql-v6.1.1/src/libpq \
           -DDLLAPI=

   GENERIC_OBJS =  regexp.o tclAsync.o tclBasic.o tclCkalloc.o \
           tclCmdAH.o tclCmdIL.o tclCmdMZ.o tclExpr.o tclGet.o \
           tclHash.o tclHistory.o tclLink.o tclParse.o tclProc.o \
           tclUtil.o tclVar.o tclFCGI.o pgtcl.o pgtclCmds.o pgtclId.o \
           neo_stuff.o

   SRCS= regexp.c tclAsync.c tclBasic.c tclCkalloc.c \
           tclCmdAH.c tclCmdIL.c tclCmdMZ.c tclExpr.c tclGet.c \
           tclHash.c tclHistory.c tclLink.c tclParse.c tclProc.c \
           tclUtil.c tclVar.c panic.c tclEnv.c tclGlob.c tclMain.c \
           tclMtherr.c tclUnixAZ.c tclUnixStr.c tclUnixUtil.c \
           tclTest.c tclAppInit.c tclFCGI.c pgtcl.c pgtclCmds.c pgtclId.c \
           neo_stuff.c
		   
		   
You'll have to change the paths there, obviously. The "DLLAPI" I didn't quite figure out what was supposed to happen with it, but defining it as nothing made the compiler much happier. The neo_stuff.* is some stuff I stole from the NeoSoft people (and hacked around a little bit). Here's a link to neo_stuff.c. The other pg*.[ch] stuff I copied over to my Tcl directory from the PostgreSQL distribution.

In tclAppInit.c I inserted

   if (Pgtcl_Init(interp) == TCL_ERROR) {
      return TCL_ERROR;
   }
In pgtcl.c I commented out the stuff regarding "Tcl_(Create|Delete)ExitHandler" and "Tcl_PkgProvide". I think that these are calls for a features in later editions of Tcl. (If I've done evil, please me know.) To add the commands that I stole from NeoSoft, I added these lines:
  Tcl_CreateCommand(interp, "html", 
                    Tcl_HtmlCmd,
                    (ClientData) 0, (Tcl_CmdDeleteProc *) NULL);

  Tcl_CreateCommand (interp, "unescape_string", 
                     Neo_UnescapeStringCmd,
                     (ClientData) NULL, (void (*)()) NULL);

  Tcl_CreateCommand (interp, "escape_string", 
                     Neo_EscapeStringCmd,
                     (ClientData) NULL, (void (*)()) NULL);
I've read that one of the major complaints against using Tcl for CGI is that regular expressions are handled more efficiently in Perl. Well, I stole the "Neo_UnescapeStringCmd" in order to more efficiently handle incoming CGI data. I know this doesn't completely answer that complaint, but I'm happier with this.

Then, in Bash,

    export CC=gcc
    ./Configure
    gmake
Which gives a new tclsh.
    ./tclsh
    % info commands pg_connect
    pg_connect
    % exit

Making a Fast Connection

Well, now that we have all the pieces, we can configure the web server, and write a little FastCGI program that connects to the database. I put these two lines in srm.conf
    FastCgiIpcDir /home/apache/fastsockets/
    AddHandler fastcgi-script .fcgi
Then, something like this in either httpd.conf, or srm.conf
    AppClass /home/apache/docs/payroll/list_employees.fcgi
The AppClass is necessary to pre-launch the FastCGI. There are more arguments to do things like have 5 processes waiting, etc. The actual script looks something like this:
    #!/usr/local/bin/fpgtcl

    set db [pg_connect payroll]

    while {[FCGI_Accept] >= 0} {
        puts -nonewline "Content-type: text/html\r\n\r\n"
        puts "<h2>List Employees</h2>"
        puts "<table border=1>"
        pg_select $db {
            select emp_no, emp_name
            from emp
            order by emp_no
        } got_back {
            puts "<tr><td>$got_back(emp_no)</td><td>$got_back(emp_name)</td></tr>"
        }
        puts "</table>"
    }

    pg_disconnect $db
The important things I was after are exemplified here. The loop with the FCGI_Accept is where the processing for each CGI request happens. Before the beginning of this, the connection to the database is opened with pg_connect. The connection stays open for the life of the script (many, many requests).

I did some rough tests. I compared this script with the FastCGI turned on and off. With FastCGI, it was about 10 times faster. This was just hitting the server locally, requesting the page, reading the whole page, and then repeating. The database I was hitting 15 records in it. But the point is that the forking overhead is gone.

In another test, I set up the server with 5 of these FastCGI processes in the queue. I then hit the server from 3 neighbor unix machines with 3 processes each. Potentially I could have had 9 simultaneous connections. I did 4500 hits in 2:28, or about 30 hits a second. I did get about 50 requests that didn't get the proper data back, but I'm sure I could just increase my pool of FastCGI's.

Remaining problem

I haven't had much time with PostreSQL, so I don't know how good it is. It seems to have a lot of connections, so somebody is using it (or promoting pretty well). I'd like to hear from people who're using it (either in a web context or not) and what they think of it. I don't know how well it will scale. It works great with tiny datasets.

Another thing that's missing here is full text indexing. There are a lot of hooks in PostgreSQL for adding features, and it should be possible to add a new indexing scheme for full text (maybe I have some code lying around here, somewhere).

Greenspun Chimes In

Editor: I can't resist commenting a bit on Patrick's article. My main caveat is that if you're not running your RDBMS in autocommit mode, you have to be careful about reusing database connections. For example, if one request issues a bunch of DML statements but then neither commits nor rolls back, presumably because there was some kind of error, you want to make sure that the next user of the database connection does not issue some more DML statements and then COMMIT the whole bunch. I sweated over this a bit when we wrote an Oracle driver for AOLserver. I think Patrick is probably OK because an error would result in his FastCGI process dying and getting restarted. When an RDBMS client (CGI script) dies, the database will rollback the work.

If you have $1000 in cash to spare, less time and Unix expertise than Patrick, and don't need source code, I personally think you'll probably be better off with Solid + AOLserver. Solid is normally $2000 but if you tell them you read about it in the CD-ROM for my book then they will let you have it for $1000. AOLserver is free. My reasons for preferring Solid are that I think it is more carefully written than any of the freeware SQL programs. My reasons for preferring AOLserver are that it has a bunch of useful API calls such as "grab me a Web page from that other server" or "send email".


asj-editors@arsdigita.com

Reader's Comments

Note:

AOLserver does not work with LinuxPPC as of March 30, 1998. The Mac person on the AOLserver team informed me that there are no current plans to support AOLserver but that he "wants to get it running on his G3 at home." So...that lame 7200 gathering dust at work isn't going to be a good test machine for AOLserver.

-- Nathanial Robertson, March 30, 1998

I like Patrick's recipe, but I have a few ingredient changes that work for me in April, 98.

Linux (RedHat 5.0) AOLserver 2.3 (beta4) PostgreSQL 6.3.1

AOLserver's built-in support for Postgres is high notch work. All of Philg's examples from Database-backed Web sites run under the AOL/Postgres combo except for the ones requiring the full-text indexing of Illustra/PLS. I regularly have uptimes in the months, with a moderatly loaded PPro 200.

For those that don't have the $500 to spend on Solid WebEngine for Linux to use with AOLserver, PostgreSQL (versions 6.2.1 and later) works nicely.



-- Lamar Owen, April 20, 1998

I would like to suggest and alternate recipe. As far as freeware tools go, You might want to look into PHP3, it fits the bill of a "Dynamic Page Language". Its free, well supported, runs on most platforms, I think there is even a Win32 version. It runs as an Apache-Mod (so its fast) and has a lot of hooks into Databases built in. You see it used alot with mySQL, put works well with other SQL databases.

-- chuck freitas, November 12, 1998
Just as an FYI, AOL Server 3.0 is now open source. This, along with improvements in open source database systems, expands the possibilities for db-backed web sites.

Thanks, AOL! Thanks also to Philip, who I'm sure had more than a little influence in this decision. ;-)

-- Frank Wortner, September 1, 1999

I've done this, though not quite in the way suggested (the FastCGI thing sounds like a hack, in particular). Apache + mod_perl + DBI + DBI::Pg + postgresql works fine. That said, the project I've developed on this will be migrated to MySQL, since (roughly speaking) it's mainly SELECTs and the overhead of having a "proper" database is undesirable.

I don't know much about Tcl, other than a lasting (and irrational) dislike engendered by the ugly "make xconfig" tool in the Linux kernel source. So I can't comment on how to do things this way. But the perl interface is quite neat (you can even make it object oriented, though I would question whether you would want to do this in many case.



-- Chris Lightfoot, September 18, 1999

Also in the "I like Perl, not TCL" stakes is HTML::Mason which offers Perl in pages (and, being built on mod_perl, doesn't fork). It's also fully buzzword compliant, offering templaiting, caching, and all that other stuff in glossy brochures. Most of it even works pretty well. (Personally, I'm quite comfortable with TCL or Perl, but I enjoy working in Perl more).

And allow me to heartily endorse Postgresql. 7.0+ is very nice indeed.



-- Rodger Donaldson, August 3, 2000
This is my first attempt a posting so here goes...... I like how this thing is free, but one thing I rarely notice are any comments about MySQL. I've been using MySQL with Perl for the past year and it's been quite fast even on a single processor 450mhz Ultra10. As since I don't see much of it I was wondering if there is anything I should know about that is inherently wrong with this choice. I know many that swear by it any many that despise it, but have yet to get any hard evidence. I've just started with TCL and I'm slowly catching on to it's little ins and outs.

-- Christopher Brown, September 13, 2000

For MySQL, you could look at the opinion of the OpenACS team on why they chose Postgresql over MySQL.

The main things that bug be about MySQL beyond the lack of transactional integrity are the fact that:

  • MySQL has a limited (unkind people would call it brain-dead) dialect of SQL. Without tools like subselects, programming with MySQL involves a lot of "do two SELECTs and kludge them together" nonsense.
  • MySQL doesn't have any concept of referential integrity; programmers waste a large amount of time manually touching data across several tables; Postgresql allows the common SQL contstraints (like cascading foreign keys), so the database can take care of changes (just like it should).
  • MySQL locking is laughable; I'm sorry, but that's the only word for it. I shouldn't have to manually lock data, and writing should not block reading. It I wanted to worry about locking strategies, I'd write my own damn database.

Recent real world-ish and benchmarketing tests have suggested that the main real benefit of MySQL - read speed - does not exist to the degree claimed.

As always, YMMV. If you like doing extra work when you don't have to and don't mind the possibility of data dissapearing with a giant sucking sound, MySQL is great.



-- Rodger Donaldson, September 27, 2000
I've been using Postgres 7 for a while now, and find it fast and stable. I really love the PHP module for Apache, because it is very fast and handles Postgres beautifully.

I know Phil G likes his AOLServer for ns_httpget and such, but I think it is a bit more sensical to separate out the application logic and let the web server just handle requests. I do see how having Tcl compiled-in makes the performance a near non-issue, but with the PHP module running inside your server process, I really think you can take your pick. 'Course, you sacrifice all that sweet Ars Digita Tcl code (maybe I'll help volunteer to write the Apache-PHP port of OpenACS)...

-- Paul Smith, February 6, 2001
spacer