ArsDigita Archives
 
 
   
 
spacer

Database Access API

by Jon Salz

ACS Documentation : ACS Core Architecture Guide : Database Access API


Backward Incompatibility Alert

As of ACS 3.4, all db_* calls that accept a SQL/DML/DDL statement as an argument now require a statement name to be provided as the first argument. See SQL Abstraction below for details.
  • Tcl procedures: /packages/acs-core/10-database-procs.tcl

The Big Picture

One of ACS's great strengths is that code written for it is very close to the database. It is very easy to interact with the database from anywhere within ACS. Our goal is to develop a coherent API for database access which makes this even easier.

There were four significant problems with the way ACS previously used the database (i.e., directly through the ns_db interface):

  1. Handle management. We required code to pass database handles around, and for routines which needed to perform database access but didn't receive a database handle as input, it was difficult to know from which of the three "magic pools" (main, subquery, and log) to allocate a new handle.

  2. Nested transactions. In our Oracle driver, begin transaction really means "turn auto-commit mode off" and end transaction means "commit the current transaction and turn auto-commit mode on." Thus if transactional code needed to call a routine which needed to operate transactionally, the semantics were non-obvious. Consider:
    proc foo { db args } {
        db_transaction {
          ...
        }
    }
    
    db_transaction {
        db_dml unused "insert into greeble(bork) values(33)"
        foo $db
        db_dml unused "insert into greeble(bork) values(50)"
    }
    This would insert greeble #33 and do all the stuff in foo transactionally, but the end transaction in foo would actually cause a commit, and greeble #50 would later be inserted in auto-commit mode. This could cause subtle bugs: e.g., in the case that the insert for greeble #50 failed, part of the "transaction" would have already have been committed!. This is not a good thing.

  3. Unorthodox use of variables. The standard mechanism for mapping column values into variables involved the use of the set_variables_after_query routine, which relies on an uplevel variable named selection (likewise for set_variables_after_subquery and subselection).

  4. Hard-coded reliance on Oracle. It's difficult to write code supporting various different databases (dynamically using the appropriate dialect based on the type of database being used, e.g., using DECODE on Oracle and CASE ... WHEN on Postgres).
The Database Access API addresses the first three problems by:
  1. making use of database handles transparent
  2. wrapping common database operations (including transaction management) in Tcl control structures (this is, after all, what Tcl is good at!)
It lays the groundwork for addressing the fourth problem by assigning each SQL statement a logical name. In a future version of the ACS Core, this API will translate logical statement names into actual SQL, based on the type of database in use. (To smooth the learning curve, we provide a facility for writing SQL inline for a "default SQL dialect", which we assume to be Oracle for now.)

To be clear, SQL abstraction is not fully implemented in ACS 3.4. The statement names supplied to each call are not used by the API at all. The API's design for SQL abstraction is in fact incomplete; unresolved issues include:

  • how to add WHERE clause criteria dynamically
  • how to build a dynamic ORDER BY clause (Ben Adida has a proposed solution for this)
  • how to define a statement's formal interface (i.e., what bind variables it expects, what columns its SELECT clause must contain if it's a query) without actually implementing the statement in a specific SQL dialect
So why is the incremental change of adding statement naming to the API worth the effort? It is worth the effort because we know that giving each SQL statement a logical name will be required by the complete SQL abstraction design. Therefore, we know that the effort will not be wasted, and taking advantage of the new support for bind variables will already require code that uses 3.3.0 version of the API to be updated.

The Bell Tolls for set_variables_after_query

set_variables_after_query is gone! (Well, it's still there, but you'll never need to use it.) The new API routines set local variables automatically. For instance:
db_1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"
doc_body_append "Hello, $first_names $last_name!"
Like ns_db 1row, this will bomb if the query doesn't return any rows (no such user exists). If this isn't what you want, you can write:
if { [db_0or1row select_names "select first_names, last_name from users where user_id = [ad_get_user_id]"] } {
    doc_body_append "Hello, $first_names $last_name!"
} else {
    # Executed if the query returns no rows.
    doc_body_append "There's no such user!"
}
Selecting a bunch of rows is a lot prettier now:
db_foreach select_names "select first_names, last_name from users" {
     doc_body_append "Say hi to $first_names $last_name for me!<br>"
}
That's right, db_foreach is now like ns_db select plus a while loop plus set_variables_after_query plus an if statement (containing code to be executed if no rows are returned).
db_foreach select_names "select first_names, last_name from users where last_name like 'S%'" {
     doc_body_append "Say hi to $first_names $last_name for me!<br>"
} if_no_rows {
     doc_body_append "There aren't any users with last names beginnings with S!"
}

Handle Management

The new API keeps track of which handles are in use, and automatically allocates new handles when they are necessary (e.g., to perform subqueries while a select is active). For example:
doc_body_append "<ul>"
db_foreach select_names "select first_names, last_name, user_id from users" {
    # Automatically allocated a database handle from the main pool.
    doc_body_append "<li>User $first_names $last_name\n<ul>"

    db_foreach select_groups "select group_id from user_group_map where user_id = $user_id" {
        # There's a selection in progress, so we allocated a database handle
        # from the subquery pool for this selection.
        doc_body_append "<li>Member of group #$group_id.\n"
    } if_no_rows {
        # Not a member of any groups.
        doc_body_append "<li>Not a member of any group.\n"
    }
}
doc_body_append "</ul>"
db_release_unused_handles
A new handle isn't actually allocated and released for every selection, of course - as a performance optimization, the API keeps old handles around until db_release_unused_handles is invoked (or the script terminates).

Note that there is no analogue to ns_db gethandle - the handle is always automatically allocated the first time it's needed.

Bind Variables

We often write queries and DML like:
db_dml presentation_delete "delete from wp_presentations where presentation_id = $presentation_id"
But what if $presentation_id isn't an integer - it's something like "3 or 1 = 1", yielding a query like
delete from wp_presentations where presentation_id = 3 or 1 = 1
which deletes every presentation in the database? We can get around this by carefully checking every string included in a SQL query, but a safer way is to use bind variables. This enables the following syntax:
db_dml presentation_delete "
    delete from wp_presentations where presentation_id = :presentation_id
"
The value for presentation_id is pulled from the Tcl variable $presentation_id (in the current stack frame), so even if input checking is spotty and $presentation_id is set to 3 or 1 = 1 as above, Oracle will attempt to delete presentations whose presentation ID is literally '3 or 1 = 1' (i.e., no presentations, since '3 or 1 = 1' can't possibly be a valid integer primary key for wp_presentations. In general, since Oracle always considers the values of bind variables to be literals, it becomes more difficult for users to perform URL surgery to trick scripts into running dangerous queries and DML.

Every db_* command accepting a SQL command as an argument now supports bind variables. You can either

  • specify the -bind switch to provide a set with bind variable values, or
  • specify the -bind switch to explicitly provide a list of bind variable names and values, or
  • not specify a bind variable list at all, in which case Tcl variables are used as bind variables.
The default behavior (i.e., if the -bind switch is omitted) is that these procedures expect to find local variables that correspond in name to the referenced bind variables, e.g.:
set user_id 123456
set role "administrator"

db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} {
    # do something for each group of which user 123456 is in the role
    # of "administrator"
}
The value of the local Tcl variable user_id (123456) is bound to the user_id bind variable.

The -bind switch can takes the name of an ns_set containing keys for each bind variable named in the query, e.g.:

set bind_vars [ns_set create]
ns_set put $bind_vars user_id 123456
ns_set put $bind_vars role "administrator"

db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} -bind $bind_vars {
    # do something for each group in which user 123456 has the role
    # of "administrator"
}
Alternatively, as an argument to -bind you can specify a list of alternating name/value pairs for bind variables:
db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} -bind [list user_id 123456 role "administrator"] {
    # do something for each group in which user 123456 has the role
    # of "administrator"
}

SQL Abstraction

We now require that each SQL statement be assigned a logical name for the statement that is unique to the procedure or page in which it is defined. This is so that (eventually) we can implement logically named statements with alternative SQL for non-Oracle databases (e.g., Postgres). More on this later.

Nulls and Bind Variables

When processing a DML statement, Oracle coerces empty strings into null. (This coercion does not occur in the WHERE clause of a query, i.e. col = '' and col is null are not equivalent.)

As a result, when using bind variables, the only way to make Oracle set a column value to null is to set the corresponding bind variable to the empty string, since a bind variable whose value is the string "null" will be interpreted as the literal string "null".

These Oracle quirks complicate the process of writing clear and abstract DML difficult. Here is an example that illustrates why:

#
# Given the table:
#
#   create table foo (
#           bar        integer,
#           baz        varchar(10)
#   );
#

set bar ""
set baz ""

db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# the values of the "bar" and "baz" columns in the new row are both
# null, because Oracle has coerced the empty string (even for the
# numeric column "bar") into null in both cases
Since databases other than Oracle do not coerce empty strings into null, this code has different semantics depending on the underlying database (i.e., the row that gets inserted may not have null as its column values), which defeats the purpose of SQL abstraction.

Therefore, the Database Access API provides a database-independent way to represent null (instead of the Oracle-specific idiom of the empty string): db_null.

Use it instead of the empty string whenever you want to set a column value explicitly to null, e.g.:

set bar [db_null]
set baz [db_null]

db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# sets the values for both the "bar" and "baz" columns to null

Placing Column Values in Arrays and Sets

Normally, db_foreach, db_0or1row, and db_1row places the results of queries in Tcl variables, so you can say:
db_foreach users_select "select first_names, last_name from users" {
    doc_body_append "<li>$first_names $last_name\n"
}
However, sometimes this is not sufficient: you may need to examine the rows returned, to dynamically determine the set of columns returned by the query, or to avoid collisions with existing variables. You can use the -column_array and -column_set switches to db_foreach, db_0or1row, and db_1row to instruct the database routines to place the results in a Tcl array or ns_set, respectively, where the keys are the column names and the values are the column values. For example:
db_foreach users_select "select first_names, last_name from users" -column_set columns {
    # Now $columns is an ns_set.
    doc_body_append "<li>"
    for { set i 0 } { $i < [ns_set size $columns] } { incr i } {
        doc_body_append "[ns_set key $columns $i] is [ns_set value $columns $i]. \n"
    }
}
will write something like:
  • first_names is Jon. last_name is Salz.
  • first_names is Lars. last_name is Pind.
  • first_names is Michael. last_name is Yoon.

API

Note that you never have to use ns_db anymore (including ns_db gethandle)! Just start doing stuff, and (if you want) call db_release_unused_handles when you're done as a hint to release the database handle.

db_null
Returns a value which can be used in a bind variable to represent the SQL value null. See Nulls and Bind Variables above.

db_nextval sequence
Returns the next value for a sequence. Ultimately this will cache a block of sequence values to save hits to the database.

db_foreach statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ] \
    code_block [ if_no_rows if_no_rows_block ]
Performs the SQL query sql, executing code_block once for each row with variables set to column values (or a set or array populated if -column_array or column_set is specified). If the query returns no rows, executes if_no_rows_block (if provided).

Example:

db_foreach select_foo "select foo, bar from greeble" {
    doc_body_append "<li>foo=$foo; bar=$bar\n"
} if_no_rows {
    doc_body_append "<li>There are no greebles in the database.\n"
}
The code block may contain break statements (which terminate the loop and flush the database handle) and continue statements (which continue to the next row of the loop).

db_1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ]
Performs the SQL query sql, setting variables to column values. Raises an error if the query does not return exactly 1 row.

Example:

db_1row select_foo "select foo, bar from greeble where greeble_id = $greeble_id"
# Bombs if there's no such greeble!
# Now $foo and $bar are set.
db_0or1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ]
Performs the SQL query sql. If a row is returned, sets variables to column values and returns 1. If no rows are returned, returns 0. If more than one row is returned, throws an error.

db_string statement-name sql [ -default default ] [ -bind bind_set_id | -bind bind_value_list ]
Returns the first column of the result of SQL query sql. If sql doesn't return a row, returns default (or throws an error if default is unspecified). Analogous to database_to_tcl_string and database_to_tcl_string_or_null.

db_list statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list of the values in the first column of the result of SQL query sql. If sql doesn't return any rows, returns an empty list. Analogous to database_to_tcl_list.

db_list_of_lists statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list, each element of which is a list of all column values in a row of the result of SQL query sql. If sql doesn't return any rows, returns an empty list. (Analogous to database_to_tcl_list_list.)

db_dml statement-name sql \
    [ -bind bind_set_id | -bind bind_value_list ] \
    [ -blobs blob_list | -clobs clob_list |
      -blob_files blob_file_list | -clob_files clob_file_list ]

Performs the DML or DDL statement sql.

If a length-n list of blobs or clobs is provided, then the SQL should return n blobs or clobs into the bind variables :1, :2, ... :n. blobs or clobs, if specified, should be a list of individual BLOBs or CLOBs to insert; blob_files or clob_files, if specified, should be a list of paths to files containing the data to insert. Only one of -blobs, -clobs, -blob_files, and -clob_files may be provided.

Example:

db_dml insert_photos "
        insert photos(photo_id, image, thumbnail_image)
        values(photo_id_seq.nextval, empty_blob(), empty_blob())
        returning image, thumbnail_image into :1, :2
    "  -blob_files [list "/var/tmp/the_photo" "/var/tmp/the_thumbnail"] 
This inserts a new row into the photos table, with the contents of the files /var/tmp/the_photo and /var/tmp/the_thumbnail in the image and thumbnail columns, respectively.

db_write_clob statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
db_write_blob statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
db_blob_get_file statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
Analagous to ns_ora write_clob/write_blob/blob_get_file.

db_exec_plsql statement-name sql
    [ -bind bind_set_id | -bind bind_value_list ]
Executes a block of PL/SQL code (which must be surrounded by BEGIN and END;). If the bind variable :1 is provided in the statement, it is considered an OUT bind variable and its value is returned.

Example:

db_exec_plsql last_visit_rotate ":1 := sec_rotate_last_visit(:browser_id, :current_time)"
db_release_unused_handles
Releases any allocated, unused database handles.

db_transaction transaction_code [ on_error error_code ]
Executes transaction_code with transactional semantics. This means that either all of the database commands within transaction_code are committed to the database or none of them are. Nested transactions are supported (end transaction is transparently ns_db dml'ed when the outermost transaction completes).

To handle errors, use db_transaction {transaction_code} on_error {error_code}. Any error generated in transaction_code will be caught automatically and process control will transfer to error_code with a variable errmsg set. The error_code block can then clean up after the error, such as presenting a usable error message to the user. Following the execution of error_code the transaction will be aborted. Alternatively, a command to continue the transaction db_continue_transaction can be issued. This command will commit any successful database commands when the transaction completes, assuming no further errors are raised. If you want to explicity abort the transaction, call db_abort_transaction from within the transaction_code block or the error_code block.

Example 1:
In this example, db_dml triggers an error, so control passes to the on_error block which prints a readable error.

    db_transaction {
	db_dml test "nonsense"
    } on_error {
	ad_return_complaint "The DML failed."
    }
    
Example 2:
In this example, the second command, "nonsense" triggers an error. There is no on_error block, so the transaction is immediately halted and aborted.
    db_transaction {
	db_dml test {insert into footest values(1)}
	nonsense
	db_dml test {insert into footest values(2)}
    } 
    
Example 3:
In this example, all of the dml statements are executed and committed. The call to db_abort_transaction signals that the transaction should be aborted which activates the higher level on_error block. That code issues a db_continue_transaction which commits the transaction. Had there not been an on_error block, none of the dml statements would have been committed.
    db_transaction {
	db_dml test {insert into footest values(1)}
	db_transaction {
	    db_dml test {insert into footest values(2) }
	    db_abort_transaction
	}
	db_dml test {insert into footest values(3) }
    } on_error {
	db_continue_transaction
    }
    
Example 4:
proc replace_the_foo { col } {
    db_transaction {
        db_dml "delete from foo"
        db_dml "insert into foo(col) values($col)"
    }
}

proc print_the_foo {} {
    doc_body_append "foo is [db_string "select col from foo"]<br>\n"
}

replace_the_foo 8
print_the_foo ; # Writes out "foo is 8"

db_transaction {
    replace_the_foo 14
    print_the_foo ; # Writes out "foo is 14"
    db_dml "insert into some_other_table(col) values(999)"
    ...
    db_abort_transaction
}

print_the_foo ; # Writes out "foo is 8"

db_abort_transaction
Aborts all levels of a transaction. That is if this is called within several nested transactions, all of them are terminated. Use this insetead of db_dml "abort" "abort transaction".

db_continue_transaction
If a transaction is set to be aborted, this procedure allows it to continue. Intended for use within a db_transaction on_error code block only.

db_resultrows
Returns the number of rows affected or inserted by the previous DML statement.

db_with_handle var code_block
Places a database handle into the variable var and executes code_block. This is useful when you don't want to have to use the new API (db_foreach, db_1row, etc.), but need to use database handles explicitly.

Example:

proc lookup_the_foo { foo } {
    db_with_handle db {
        return [database_to_tcl_string $db "select ..."]
    }
}

db_with_handle db {
    # Now there's a database handle in $db.
    set selection [ns_db select $db "select foo from bar"]
    while { [ns_db getrow $db $selection] } {
        set_variables_after_query

        lookup_the_foo $foo
    }
}
db_nullify_empty_string string
For true SQL purists, we provide the convenience function db_nullify_empty_string, which returns [db_null] if its string argument is the empty string and can be used to encapsulate another Oracle quirk:
set baz ""

# Clean out the foo table
#
db_dml unused "delete from foo"

db_dml unused "insert into foo(baz) values('$baz')"

set n_rows [db_string unused "select count(*) from foo where baz is null"]
#
# $n_rows is 1; in effect, the "baz is null" criterion is matching
# the empty string we just inserted (because of Oracle's coercion
# quirk)
To balance out this asymmetry, you can explicitly set baz to null by writing:
db_dml foo_insert "insert into foo(baz) values(:1)" {[db_nullify_empty_string $baz]}

Implementation Design (work in progress)

The ideas here are preliminary, so please send feedback to
michael@arsdigita.com. There may well be a much simpler, superior design that I (Michael) am just missing right now. If so, please let me know!

The basic idea is to translate the logical statement-name into an actual SQL statement, written in the appropriate SQL dialect for the RDBMS that is in use. The sql argument is essentially a convenience that enables the SQL for the "default dialect" to be written inline. For 3.4, we will probably use configuration parameters to tell the Database Access API what the default dialect is and what dialect is actually in use:

[ns/server/server_name/acs]
...
DefaultSQLDialect=oracle8
SQLDialect=postgres7
(An alternative approach would be to use the ACS Package Manager, i.e., install a "pseudo-package" with no actual code to indicate what RDBMS is installed. Then, the Database Access API could query the APM to figure what SQL dialect to employ.)

For instructing the Database Access API to translate a named statement in a specific SQL dialect, we may define a new API call:

db_implement_statement statement_location statement_name sql_dialect sql
which would be called at server initialization time. The Database Access API will then know to use the SQL statement appropriate for the specified SQLDialect. (The name db_implement_statement is very tentative.)

Issues:

  • Is making the caller of db_implement_statement explicitly specify the statement location (e.g., "/bboard/q-and-a") too much of a pain? Can we make this more convenient somehow?

  • In the case that the inline SQL is not in the specified SQLDialect, reading the rewritten SQL into memory for the life of the server may not be a good idea. The three basic approaches I can think of to implement the db_implement_statement API are:
    1. Cache the rewritten SQL for the appropriate SQL dialect in an nsv array
    2. Cache the rewritten SQL for the appropriate SQL dialect in a special database table that we keep pinned in memory
    3. Cache the rewritten SQL for the appropriate SQL dialect in a special file, maybe even a DBM file

  • Given the above two issues, should we rethink the db_implement_statement API altogether?

    One possibility is a file-based approach, where the alternative SQL statements would live in conventionally named and located files, e.g., /bboard/q-and-a.postgres7 would contain Postgres 7 versions of the SQL statements in /bboard/q-and-a.tcl.) A potential con of this approach is that the Database Access API would have to perform file I/O for every SQL statement that's been rewritten. This may be a non-issue; I don't actually know. (We could augment this approach with caching too, perhaps a fixed-size LRU cache.)

    Another similar approach would be just to have one massive, magic file for each SQL dialect that maps each statement identifier (location plus name) to the corresponding statement.

  • Another larger problem is the fact that this design does not work for instances where we build a SQL statement based on control flow logic, e.g., we sometimes join in an extra table based on the user input. This problem doesn't mean that the design as a whole is broken; it just means that this design alone does not get us all the way to full SQL abstraction.
Version 2.1 of the ArsDigita Oracle Driver adds a set of ns_ora analogs for the following ns_db calls: 0or1row, 1row, select, and dml. (It also adds ns_ora array_dml.) Thus, the groundwork for implementing the above API for ACS/Oracle is already established.

We plan to defer to the OpenACS team for the Postgres implementation of the API.


jsalz@mit.edu
spacer