ACS Package Manager (APM)
by
Jon Salz,
Michael Yoon,
and
Lars Pind
ACS Documentation :
ACS Core Architecture Guide :
ACS Package Manager (APM)
The Big Picture
In general terms, a
package is a unit of software
that serves a single well-defined purpose. That purpose may be to
provide a service directly to one or more classes of end-user, (e.g.,
discussion forums and file storage for community members, user
profiling tools for the site publisher), or it may be to act as a
building block for other packages (e.g., an application programming
interface (API) for storing and querying access control rules, or an
API for scheduling email alerts). Thus, packages fall into one of two
categories:
- Application packages: a "program or group of
programs designed for end users" (the Webopedia
definition); also known as modules, for historical
reasons
- Library packages: the aforementioned building
blocks
The ACS itself a collection of interdependent library and application
packages. Prior to ACS 3.3, all packages were lumped together into one
monolithic distribution without explicit boundaries; the only way to
ascertain what comprised a given package was to look at the top of the
corresponding documentation page, where, by convention, the package
developer would specify where to find:
- the data model
- the Tcl procedures
- the user-accessible pages
- the administration pages
Experience has shown us that this lack of explicit boundaries causes a
number of maintainability problems for pre-3.3 installations:
- Package interfaces were not guaranteed to be stable in any formal
way, so a change in the interface of one package would often break
dependent packages (which we would only discover through manual
regression testing). In this context, any of the following could
constitute an interface change:
- renaming a file or directory that appears in a URL
- changing what form variables are expected as input by a page
- changing a procedural abstraction, e.g., a PL/SQL or Java stored
procedure or a Tcl procedure
- changing a functional abstraction, e.g., a database view or a
PL/SQL or Java stored function
- changing the data model
This last point is especially important. In most cases, changing the
data model should not affect dependent packages. Rather, the
package interface should provide a level of abstraction above the data
model (as well as the rest of the package implementation). Then, users
of the package can take advantage of implementation improvements that
don't affect the interface (e.g., faster performance from intelligent
denormalization of the data model), without having to worry that code
outside the package will now break.
- A typical ACS-backed site only uses a few of the modules included
in the distribution, yet there was no well-understood way to pick only
what you need when installing the ACS, or even to uninstall what you
didn't need, post-installation. Unwanted code had to be removed
manually.
- Releasing a new version of the ACS was complicated, owing again to
the monolithic nature of the software. Since we released everything in
the ACS together, all threads of ACS development had to converge on a
single deadline, after which we would undertake a focused QA effort
whose scale increased in direct proportion to the expansion of the ACS
codebase.
- There was no standard way for developers outside of ArsDigita to
extend the ACS with their own packages. Along the same lines,
ArsDigita programmers working on client projects had no standard way
to keep custom development cleanly separated from ACS
code. Consequently, upgrading the ACS once installed was an
error-prone and time-consuming process.
The ACS is basically a platform for web-based application software,
and any software platform has the potential to develop problems like
these. Fortunately, there are many precedents for systematic ways of
avoiding them, including:
Borrowing from all of the above, ACS 3.3 introduces its own package
management system, the ACS Package Manager (APM), which consists of:
- a standard format for APM packages (also called
"ACS packages"), including:
- version numbering, independent of any other package and the ACS as
a whole
- specification of the package interface
- specification of dependencies on other packages (if any)
- attribution (who wrote it) and ownership (who maintains it)
- web-based tools for package management, i.e.:
- obtaining packages from a remote distribution point
- installing packages, if and only if:
- all prerequisite packages are installed
- no conflicts will be created by the installation
- configuring packages (obsoleting the monolithic ACS configuration
file)
[ACS4]
- upgrading packages, without clobbering local modifications
- uninstalling unwanted packages
- a registry of installed packages, database-backed
and integrated with filesystem-based version control
- web-based tools for package development,
i.e.:
- creating new packages locally
- releasing new versions of locally-created packages
Consistent use of the APM format and tools will go a long way toward
solving the maintainability problems listed above. Moreover, APM is
the substrate that will enable us to soon establish a central package
repository, where both ArsDigita and third-party developers will be
able publish their packages for other ACS users to download and
install.
For a simple illustration of the difference between ACS without APM
(pre-3.3) and ACS with APM (3.3 and beyond), consider a hypothetical
ACS installation that uses only two of the thirty-odd modules
available circa ACS 3.2 (say, bboard and ecommerce):
APM itself is part of a package, ACS Core, a library
package that is the only mandatory component of an ACS installation.
The Components of an APM Package
An APM package consists of:
- A set of interfaces
- Implementations of those interfaces
- Documentation
- A package specification
Package Interfaces
There are three types of interface that an APM package can define:
- application programming interface (API): A
stable, well-documented set of methods for interacting with the
package programmatically, either to query it for information or to
command it to perform an action.
- user interface (UI): For each class of end-user
(e.g., community member, site administrator), a set of web pages that
provides a stable set of features.
- configuration interface: A stable set of
parameters that can be used to control the behavior of the package,
whose values can be set non-programmatically, i.e., with a
configuration file and/or through a user interface.
By definition, an application package provides a UI but may or may not
provide an API. Conversely, a library package provides an API but may
or may not provide a UI. A configuration interface is optional for
either type of package.
Package Implementation
Implementation varies by type of interface:
- APIs are implemented as one or more of the following: PL/SQL or
Java stored procedure and functions, database views, Tcl library
procedures, linkable URLs, e.g.,
/user-search
- UIs are implemented as one or more of the following: HTML pages,
Tcl pages, AOLserver Dynamic Pages (ADPs), registered procedures.
- Virtually all API and UI implementations include a database
schema (a.k.a. data model).
- Currently, the standard way to implement a package's configuration
interface is through an
auxiliary AOLserver configuration file. A database-backed,
generic configuration facility will be introduced in version 4.0 of
the ACS Core package.
(Note that we now consider the database schema to be part of the
package implementation, not the package interface. In other words, the
only code that should execute queries or DML against a package's
schema is the package's own implementation code. There are legacy
violations of this rule that will be corrected incrementally.)
Package Documentation
A package must contain one or more of the following types of
documentation:
- High-level design documentation, written in lay terms ("The Big
Picture"); every package should have this.
- API documentation for programmers writing code that
depends on the package
- "Help" pages for end-users (with good UI design, we
shouldn't need too many of these)
- Configuration instructions for administrators who have installed
the package on their site: what parameters are available; for each
parameter, what values are valid;
- Implementation documentation for the package maintainer ("Under
the Hood"), e.g., descriptions of any optimizations like
denormalization or caching, periodic processes (i.e., scheduled
procedures), external programs or scripts used, etc.
Package Specification: The .info
file
The package specification is an XML document that lists:
- properties of the package such as name, version, owner
- the interfaces that the package provides
- the external interfaces upon which the package depends
- the names and types of all files included in the package
Package specifications are typically not authored manually; rather,
APM provides a UI for
Here is a sample excerpt from the specification of the ACS Core
package itself:
<?xml version="1.0"?>
<!-- Generated by the ACS Package Manager -->
<package key="acs-core" url="http://software.arsdigita.com/packages/acs-core">
<version name="3.3.0" url="http://software.arsdigita.com/packages/acs-core-3.3.0.apm">
<package-name>ACS Core</package-name>
<owner url="mailto:jsalz@mit.edu">Jon Salz</owner>
<summary>Routines and data models providing the foundation for ACS-based Web services.</summary>
<release-date>2000-06-03</release-date>
<vendor url="http://www.arsdigita.com/">ArsDigita Corporation</vendor>
<provides url="http://software.arsdigita.com/packages/developer-support/tcl-api" version="0.2d"/>
<!-- No included packages -->
<files>
<file type="tcl_procs" path="00-proc-procs.tcl"/>
<file type="tcl_procs" path="10-database-procs.tcl"/>
...
</files>
</version>
</package>
The only attributes of the
<package>
element itself are
key
and
url
. The
key
attribute
is a default short name for the package that appears in the APM site
administrator UI; to enable the prevention of namespace collision, the
key
is not fixed but can be changed within an ACS
installation. The
url
attribute identifies the
authoritative distribution point for the package (specifically, a
directory from which all versions of the package can be obtained). It
also serves as the package's universally unique identifier and
therefore cannot be changed.
All other properties of the package are stored as attributes and child
elements of the <version>
element, since they can
vary from version to version. The <version>
element
also has two attributes: name
and url
. The
name
attribute is actually a version number that conforms
to the numbering convention defined below. It is called
name
instead of number
, because it can be
alphanumeric, not purely numeric. The name
attribute also
designates the maturity of the package: development, alpha, beta, or
release. As with the <package>
element, the
url
attribute identifies the authoritative distribution
point for the specified version of the package (specifically, the
location of an actual package file that can be downloaded) and serves
as the package version's universally unique identifier.
The version
element contains:
- One
<package-name>
element, which is a pretty
name for the package
- One or more
<owner>
elements, each of which
identifies a party responsible for maintenance of the package
- One
<summary>
element
- One
<description>
element (optional)
- One
<release-date>
element
- One
<vendor>
element (optional), which
identifies the organization that maintains the package
- Zero or more
<provides>
elements, each of which
identifies an interface provided by the package
- Zero or more
<requires>
elements, each of which
identifies an interface upon which the package depends
- One
<files>
element, containing one
<file>
element for each
- One or more
<parameter>
elements that specify
the package's configuration interface
[ACS4]
A
<provides>
or
<requires>
element identifies an interface with the combination of its
url
and
version
attributes, where
url
is a universally unique identifier for the interface
(API or UI) and
version
is an identifier that conforms to
the same version numbering convention used for packages. The
convention for constructing an interface URL is:
http://vendor-host/packages/logical-name/implementation-type
In the above example, the
vendor-host
is
software.arsdigita.com
, the
logical-name
is
developer-support
,
and the
implementation-type
is
tcl-api
. Other
implementation-type
values
include
plsql-api
,
sql-views
, and
java-api
. (At this time, the result of visiting an
interface URL is undefined; in the future, it will display the
documentation for the identified interface.)
Once an interface is published in an <provides>
element, future versions of the package must maintain that interface,
i.e., no changes can be made to the interface or its implementation
that would cause dependent code to break. The interface can
be augmented, in which case the version number should be incremented,
i.e., a later version of an interface is always the superset of an
earlier version. To communicate the fact that an incompatible change
has been made to an interface, the package owner will remove the
original <provides>
element and add a new,
different <provides>
element, e.g., hypothetically,
we might someday replace developer-support/tcl-api
with
developer-support/tcl-api-2
.
Also, a <provides>
element can include a
deprecated
attribute, meaning that the package owner
expects to remove the corresponding interface in the future.
Version Numbering Convention
A version number consists of:
- A major version number.
- Optionally, up to three minor version numbers.
- One of the following:
- The letter
d
, indicating a development-only version (i.e., definitely broken)
- The letter
a
, indicating an alpha release (i.e., probably broken)
- The letter
b
, indicating a beta release (i.e., somewhat broken)
- No letter at all, indicating a final release (i.e., not broken or, realistically, broken a little)
In addition, the letters d
, a
, and
b
may be followed by another integer, indicating a
version within the release.
For those who like regular expressions:
version_number := integer ('.' integer){0,3} (('d'|'a'|'b') integer?)?
So the following is a valid progression for version numbers:
0.9d, 0.9d1, 0.9a1, 0.9b1, 0.9b2, 0.9, 1.0, 1.0.1, 1.1b1, 1.1
Distribution Format: The .apm
file
In
Maximum
RPM, Edward Bailey writes:
Normally, package management systems take all the various files
containing programs, data, documentation, and configuration
information, and place them in one specially formatted file -- a
package file.
This description fits APM packages, which are distributed as
gzip
-compressed
tarfiles, with the special extension
.apm
. The full
naming convention for APM package files is:
package-key-package-version-name.apm
For instance, the first production release of the ACS Core package
is named
acs-core-3.3.0.apm
.
Inside the tarfile, there is one directory at the top level, with the
same name as the package key, which, in turn, contains:
- an optional
www
directory, in which the
implementation of the package's UI (if any) resides
- zero or more Tcl scripts that are loaded when the server
starts. Files ending in
-procs.tcl
define Tcl procedures;
files ending in -init.tcl
contain code to be run at
initialization time (e.g., filter registration).
- zero or more SQL files (any files in the directory with a
.sql
extension) that contain the DDL statements to
install the package's database schema and/or the package's
database-resident API (views, stored procedures, stored functions)
- zero or more SQL files, each of which upgrades the package's
database schema from one version to a later version (not necessarily
the next version, if no upgrades were needed for intervening versions)
and is named according to the convention:
upgrade-version-name-next-version-name.sql
(If any of these files are present, they will be located in an
upgrade
subdirectory.)
- a documentation file named
package-key.html
or package-name.adp
, or a doc
subdirectory containing multiple documentation files
- The package specification file, named
package-key.info
Aside from the package specification, all items listed above are
optional.
ACS Directory Structure
APM installs packages in the
packages
subdirectory of the
server root directory, at the same level as the legacy
www
,
tcl
, and
parameters
directories (which, by the way, continue to serve the same purposes as
they did in versions of ACS prior to 3.2; we may remove some of this
backward-compatibility in ACS 4).
Thus, the directory structure of the hypothetical ACS 3.3 installation
that is illustrated in the diagram above would look something like
this:
server-root/
|
+-- packages/
|
+-- acs-core/
|
+-- bboard/
| |
| +-- doc/
| | |
| | +-- index.html
| | |
| | +-- ...
| |
| +-- www/
| | |
| | +-- admin/
| | | |
| | | +-- index.adp
| | | |
| | | +-- ...
| | |
| | +-- index.adp
| | |
| | +-- ...
| |
| +-- bboard.info
| |
| +-- bboard.sql
| |
| +-- bboard-init.tcl
| |
| +-- bboard-procs.tcl
| |
| +-- ...
|
+-- ecommerce/
|
+-- ...
Another component of the ACS Core package,
the Request Processor, is responsible for
making the various package user interfaces integrate into one coherent
hierarchy of URLs. The basic algorithm used to translate a URL into a
filesystem path is simple: "When an HTTP request for
/package-key/filename
is received, then
return the file
server-root/packages/package-key/www/filename
."
(In reality, the job of the Request Processor is not so simple.)
Changes From ACS 3.2 and Prior Versions
Prior to the introduction of APM in ACS 3.3, the contents of a given
package were scattered throughout the site's physical structure:
- the Tcl library scripts for all packages were located in the
server-root/tcl
directory
- the UI pages for all packages were located in the directory
structure beneath the page root
(
server-root/www
), which translated directly
into the site's URL hierarchy
- the data model files for all packages were located in the
server-root/www/doc/sql
directory
In contrast, APM imposes a vertical organization wherein the
filesystem does not map directly to the URL hierarchy. The main
advantage of the pre-APM filesystem organization was the fact that,
given a URL, you always knew where to look for the corresponding file
under the page root. In our judgement, the benefit of having the
filesystem explicitly preserve the modularity of installed packages
outweighs this advantage, and the extra complexity that's now built
into the Request Processor.
Future Improvements
- Implement aforementioned configuration facility.
- Adjust design and implementation to work with forthcoming
Parties/Subcommunities model.
- Implement installation chaining, i.e., installing one package causes
any required packages that are not installed to be installed, if they
can be obtained. (The FreeBSD ports collection does this.)
- Implement composite packages, i.e., packages that contain other
packages. There is already stub support for this. Installation
chaining may actually make this superfluous.
- Compliance with XML Namespaces (http://www.w3.org/TR/REC-xml-names/);
may provide a standard way to solve the namespace collision problem
that the
key
attribute of the package
element is designed to address.
- A method for explicit definition of interfaces (i.e., mapping a UI
identifier to be a set of URLs or an API identifier to a set of
procedure/function signatures) and, potentially, automated detection
of incompatibility
- Consider a suffix other than
.info
for package
specifications: perhaps just .xml
?
- Documentation improvements:
- Write a formal DTD for APM package specifications.
- User experience documentation: for each class of user, what
questions can be asked, what actions performed.
- API documentation
- Add examples of how interfaces can be broken
- Document the integration of CVS and APM (specifically regarding
imported packages vs. locally developed packages)
- Documentation browser for installed packages
- Consider moving a separate
api
directory
- Clarify the rules that map files in packages to URLs; what follows
is preserved from an earlier version of this document:
The distribution file containing a package is rooted at the server root, so
(for instance) one might find the file packages/address-book/address-book.html
in the package. If for some reason a package needs to contribute a file to the
global www
directory rather than its package-private one, the package could
just contain the file www/foo/bar.tcl
; this file would be installed into the
site-wide www
directory.
Package distribution files can contain files in other packages' directories;
this flexibility will be useful in case a package needs to augment another package by
providing extra services. For instance, a package providing attachment support for the
address book might contain a packages/address-book/www/view-attachments.tcl
file.
However, it could not contain a new packages/address-book/www/index.tcl
file
- we allow a file to belong to only one package. (To provide a "hook" to the attachment package,
the address book could use a Package Manager API to determine whether the attachment package
is installed, displaying a link to view-attachments.tcl
only in that case.)
Under the Hood
At startup, the ACS Core scans all package specifications and
synchronizes them with the database. Mismatches (indicating that new
packages have been installed) will result in appropriate action
(running upgrade scripts or notifying the administrator).
michael@arsdigita.com