File-Storage Design Specification
by
Rob Storrs
I. Essentials
II. Introduction
We have our own file-storage application because we want all users
to be able to collaboratively maintain a set of documents.
Specifically, users can save their files to our server so that they
may:
- Organize files in a hierarchical directory structure
- Upload using Web forms, using the file-upload feature of Web
browsers (potentially SSL-encrypted)
- Grab files that are served bit-for-bit by the server, without
any risk that a cracker-uploaded file will be executed as code
- Retrieve historical versions of a file
We want something that is relatively secure, and can be extended and maintained by any
ArsDigita programmer, i.e., something that requires only AOLserver Tcl
and Oracle skills.
III. Historical Considerations
File-storage was created to provide a mechanism for non-technical users to
collaborate on a wide range of documents, with minimum sysadmin
overhead. Specifically,
it allowed clients to exchange design documents (often MS Word, Adobe PDF,
or other proprietary desktop file format) that changed frequently
without having to get bogged down by sifting thru multiple versions.
IV. Competitive Analysis
Why is a file-storage application useful?
If you simply give everyone FTP access to a Web-accessible directory,
you are running some big security risks. FTP is insecure and passwords
are transmitted in the clear. A cracker might sniff a password, upload
.pl and .adp pages, then grab those URLs from a Web browser. The cracker
is now executing arbitrary code on your server with all the privileges
that you've given your Web server.
The file-storage module is not a web-based file system, and can not be
fairly compared against such systems. The role of file-storage is to
provide a simple web location where users could share a versioned
document. It does not allow much functionality with respect to
aggregate file administration (ex. selecting all files of a given type,
or searching through specified file types).
V. Design Tradeoffs
File vs. Folder Permissions?
A folder is treated as a type of file. Files are owned by a single user, but may contain versions created by authors other than the owner.
Permissions were only given to files and not folders in order to simplify both the code and the user interface i.e. to avoid questions like "Why can't any of
the people in my group see my files?" answered by "Did you notice that someone changed the permissions of the parent of the parent of the parent folder
of this file?" However, the system is easy to extend to allow folders to have thier own permissions.
Full Text Indexing
Full Text Indexing of files within the file-storage system is available if you're running Oracle 8i (8.1.5 or later). You would need to build an
Intermedia text index (ConText) on the contents of file versions.
Intermedia incorporates very smart filtering software so that it can
grab the text from within HTML, PDF, Word, Excel, etc. documents. It is
also smart enough to ignore JPEGs and other pure binary formats.
Steps to using Intermedia:
- install Intermedia (Oracle dbadmin hell)
- get Intermedia's optional "INSO filtering" system to work. Here's
what jsc@arsdigita.com had to say about his experience doing this...
I got the INSO stuff working. The major holdup was that you have to
configure listener.ora to have $ORACLE_HOME/ctx/lib in
LD_LIBRARY_PATH. The docs mumble something about editing listener.ora,
but a careful perusal of anything having to do with networking setup
didn't turn up any examples. The networking assistant program has a
field for "Environment", but when you try to put anything in there, the
program hits a null pointer exception when you go to save it and doesn't
write anything. I "solved" this eventually by just symlinking all the
.so files in ctx/lib into $ORACLE_HOME/lib, which is already in the
LD_LIBRARY_PATH for the listener.
- In order to have the interMedia index synchronized whenever
documents get added or updated, the index must be synchronized (using
alter index indexname rebuild online parameters
('sync')
), or the ctxsrv process must be run, which updates all
interMedia indices periodically (ctxsrv -user
ctxsys/ctxpassword
). If using ctxsrv, the shell which starts it
must have $ORACLE_HOME/ctx/lib
as part of LD_LIBRARY_PATH.
- uncomment the
create index fs_versions_content_idx
statement in file-storage.sql (and then feed it to Oracle)
- set
UseIntermediaP=1
in your ad.ini file
- restart AOLserver (so that it reads the new parameter setting)
Warning: Intermedia is a tricky product for users. The default mode is
exact phrase matching, which means that the more a user types the fewer
search results will be returned (a violation of the user interface
guidelines in
developers). So you
might be letting yourself in for some education of users...
Deletion of Files
Only an administrator can actually delete a file from storage within
the database, thereby freeing up disk space. A user-level file deletion
really only hides the file from view (by changing the deleted_p flag).
From the user's perspective, the file has been deleted from the
system. As such, users may be less respectful of storage requirements
than if the system was fully explained to them.
This arrangement allows administrators the ability to retrieve files that
users inadvertently deleted, but subsequently requires administrative
involvement for the recovery of actual disk space.
VI. Data Model Discussion
The file-storage system is built around a data model consisting of two
tables, one for files and a second for versions. A folder is treated as
a type of file. Files are owned by a single user, but may contain
versions created by authors other than the owner.
Indices on the file ids are required for the CONNECT BY queries used
for ordering the files for display.
The view fs_files_tree simplifies the ability to "walk the tree" in
Oracle.
VII. Legal Transactions
/file-storage/
- Create a folder
- Upload a file
- "Delete" a file (actually hides them)
- Upload a newer version of a file
- Download a version of a file
/admin/file-storage/
- View system usage
- Delete files
- Edit files
VIII. API
PL/SQL procedures
none
TCL procedures
fs_check_edit_p user_id version_id [ group_id ]
Returns 1 if the user has permission to edit the version of the file; 0 otherwise
- Parameters:
-
user_id
version_id
group_id (optional)
|
fs_check_read_p user_id version_id [ group_id ]
Returns 1 if the user can read the version of the file; 0 otherwise.
- Parameters:
-
user_id
version_id
group_id (optional)
|
fs_check_write_p user_id version_id [ group_id ]
Returns 1 if the user can write the file; 0 otherwise.
- Parameters:
-
user_id
version_id
group_id (optional)
|
fs_date_picture
Returns date picture to use with Oracle's TO_CHAR function. Pulls it from ad.ini parameters file.
|
fs_folder_box user_id topmost_option
Returns the folder box.
Arguments: user_id the user who is logged in
topmost_option the option that should occur on top
- Parameters:
-
user_id
topmost_option
|
fs_folder_def_selection user_id [ group_id ] [ public_p ] [ file_id ] \
[ folder_default ]
Write out the SELECT box that allows the user to move a file to
another folder, or - if folder_default is provided - create a new
folder.
- Parameters:
-
user_id
group_id (optional)
public_p (optional)
file_id (optional)
folder_default (optional)
|
fs_folder_selection user_id [ group_id ] [ public_p ] [ file_id ]
Write out the SELECT box that allows the user to move a file to another folder
- Parameters:
-
user_id
group_id (optional)
public_p (optional)
file_id (optional)
|
fs_guess_source public_p owner_id group_id local_user_id
Given some information about a file, tries to guess in which subtree the file belongs. Mainly used by one-file.tcl.
- Parameters:
-
public_p
owner_id
group_id
local_user_id
|
fs_header_row_for_files [ -title title ] [ -author_p author_p ]
Returns a table header row containing column names appropriate for
a listing of files alone (i.e., not versions of files). Name, Size,
Type, Modified.
If you set author_p to 1, you'll additionally get an author column.
- Switches:
-
-title (optional)
-author_p (defaults to "0" )
|
fs_order_files [ user_id ] [ group_id ] [ public_p ]
Set the ordering and depth for the files so that they may be displayed quickly
- Parameters:
-
user_id (optional)
group_id (optional)
public_p (optional)
|
fs_pretty_file_type mime_type
Takes a MIME type and returns a string to be displayed for that type.
- Parameters:
-
mime_type
|
fs_row_for_one_file [ -n_pixels_in n_pixels_in ] [ -file_id file_id ] \
[ -folder_p folder_p ] [ -client_file_name client_file_name ] \
[ -n_kbytes n_kbytes ] [ -n_bytes n_bytes ] \
[ -file_title file_title ] [ -file_type file_type ] [ -url url ] \
[ -creation_date creation_date ] [ -version_id version_id ] \
[ -links links ] [ -author_p author_p ] [ -owner_id owner_id ] \
[ -owner_name owner_name ] [ -user_url user_url ] \
[ -export_url_vars export_url_vars ] [ -folder_url folder_url ] \
[ -file_url file_url ]
Returns one row of a HTML table displaying all the information about a file.
Set links to 0 if you want this file to be output without links to manage it
(to display the folder you're currently in).
A little explanation is in place here.
The first bunch of arguments are all standard stuff we want to
know about the file. The n_pixels_in is whatever amount of pixels
you want this line indented.
Then there's the 'links' argument. It's used for one-folder, which
likes to show the current folder first, without the hyperlinks. So
if you don't want links from an entry (only works for folders) set
this to 0.
Then there's author. If you want the author shown, set author_p
and provide us with owner_id and owner_name, and you'll get the
link. If you want the link to go somewhere different than
/shared/community-member, you'll want to set user_url to the page
you want to link to (user_id will be appended).
Set export_url_vars to the vars you want exported when a
file or folder link is clicked. It should be a query string
fragment. If you're unhappy with the default urls 'one-folder' or
'one-file' (say, you're implementing admin pages where they're
named differently), change them here. The export_url_vars will be
appended.
- Switches:
-
-n_pixels_in (defaults to
"0" )
-file_id (optional)
-folder_p (defaults to "f" )
-client_file_name (optional)
-n_kbytes (optional)
-n_bytes (optional)
-file_title (optional)
-file_type (optional)
-url (optional)
-creation_date (optional)
-version_id (optional)
-links (defaults to "1" )
-author_p (defaults to "0" )
-owner_id (defaults to "0" )
-owner_name (optional)
-user_url (defaults to "/shared/community-member" )
-export_url_vars (optional)
-folder_url (defaults to "one-folder" )
-file_url (defaults to "one-file" )
|
fs_user_contributions user_id purpose
For site admin only, returns statistics and a link to a details page
- Parameters:
-
user_id
purpose
|
IX. User Interface
The user interface attempts to replicate the file system metaphors
familiar to most computer users, with folders containing files.
Adding files and folders are hyperlinked options, and a web form is
used to handle the search function.
Users can navigate to any specified document tree using a select box.
Files and folders available within a document tree are presented with
size, type, and modification date, alongside hyperlinks to the
appropriate actions for a given file.
X. Configuration/Parameters
Configuration of the system is kept to a minimum.
; for the ACS File-Storage System
[ns/server/yourserver/acs/fs]
SystemName=File Storage System
SystemOwner=file-administrator@yourserver.com
DefaultPrivacyP=f
; do you want to maintain a public tree for site wide documents
PublicDocumentTreeP=1
MaxNumberOfBytes=2000000
DatePicture=MM/DD/YY HH24:MI
HeaderColor=#cccccc
FileInfoDisplayFontTag=<font face=arial,helvetica size=-1>
UseIntermediaP=0
XI. Acceptance Tests
- Go to /file-storage/ and upload a file
- Create a folder and move the file into it
- Change the properties of a file
- Upload another version of the same file
- Delete a file from the system
- Delete a folder
XII. Future Improvements/Areas of Likely Change
- Currently the administration section needs considerable work. Instead of trying to clean /admin/file-storage/ up, we should build a better /file-storage/admin or even allow administrators to do more within /file-storage/.
- Ticket Tracker style column sorting. We want the ability to sort the contents of each folder by name, author, size, type and last modified. In addition, the folders should be able to sort among themselves by name. You should use something very similar to the procedure ad_table. The procedure that you use will be slightly different because the files will be sorted on a per folder basis instead of on a per table basis.
- Better organization of the folder tree - Make the interface more
of a Window's style interface. Add a + type icon next to the folder
if the folder is open and all of the files in the folder can be seen.
Add a - icon when the folder is closed and can be expanded. Clicking
on the + sends the user back to the same page with the contents of the
folder to be hidden and the - icon in place of the +. Clicking on the -
sends the user back to the same page causing a + to replace the - and
all of the files in the folder to be shown. Clicking on the folder
icon or name should act just as they do now.
- Nifty javascript version
- File viewer: allowing users to view multiple file formats within their browser.
- Email alerts on a folder, so that a user could get an alert whenever a new document is posted within a document tree or a specified folder.
XIII. Authors
rstorrs@arsdigita.com