Note that this is the same table that holds the dimensions for classifying content on a site. If this were not so, it would be painful to match up users and content.create table categories ( category_id integer not null primary key, category varchar(50) not null, category_description varchar(4000), -- e.g., for a travel site, 'country', or 'activity' -- could also be 'language' category_type varchar(50), -- language probably would weight higher than activity profiling_weight number default 1 check(profiling_weight >= 0), enabled_p char(1) default 't' check(enabled_p in ('t','f')), mailing_list_info varchar(4000) );
The category_type
column bears mentioning. It isn't used
for score computation but only for user interface when querying the user
or giving options for the site administrator. The
profiling_weight
column may be used to disable scoring
along a certain dimension or to make language preference much more
important than other preferences.
The mailing_list_info
is legacy column from time where the
predominant function of categorization system was to track down who was
spammed about what. It survived only to support some legacy code that
may still lurk around (note that current ACS no longer uses this column).
The category_description
can be used in other places where
a user simply wants clarification about a category.
category_type
system doesn't work for
you. You are running a travel site and want to present the "Appalachian
Trail" category to folks who have selected one of the states through
which the AT runs (Georgia, Tennessee, North Carolina, Virginia, West
Virginia, Maryland, Pennsylvania, New Jersey, New York, Connecticut,
Massachusetts, Vermont, New Hampshire, Maine).
You represent this hierarchy with the following table:
Notice that this isn't a strict hierarchy; a category can have more than one parent. Also notice that we use UNIQUE rather than PRIMARY KEY. That is because we signify the top-level hierarchies with a NULLcreate table category_hierarchy ( parent_category_id integer references categories, child_category_id integer regerences categories unique (parent_category_id, child_category_id) );
parent_category_id
column.
static_categories
(mapped static pages to categories).
To build user and admin interfaces when queryingcreate sequence site_wide_cat_map_id_seq; create table site_wide_category_map ( map_id integer primary key, category_id not null references categories, -- We are mapping a category in the categories table -- to another row in the database. Which table contains -- the row? on_which_table varchar(30) not null, -- What is the primary key of the item we are mapping to? -- With the bboard this is a varchar so we can't make this -- and integer on_what_id varchar(500) not null, mapping_date date not null, -- how strong is this relationship? -- (we can even map anti-relationships with negative numbers) mapping_weight integer default 5 check(mapping_weight between -10 and 10), -- A short description of the item we are mapping -- this enables us to avoid joining with every table -- in the ACS when looking for the most relevant content -- to a users' interests -- (maintain one_line_item_desc with triggers.) one_line_item_desc varchar(200) not null, mapping_comment varchar(200), -- only map a category to an item once unique(category_id, on_which_table, on_what_id) );
site_wide_category_map
, we use the central
table_acs_properties
, which is shared by side-wide index
and the general comments facility.
Here is an example entry for thecreate table table_acs_properties ( table_name varchar(30) primary key, section_name varchar(100) not null, user_url_stub varchar(200) not null, admin_url_stub varchar(200) not null );
bboard
table:
table_name
- bboard
section_name
- Discussion Forums
user_url_stub
- /bboard/q-and-a-fetch-msg?msg_id=
admin_url
- /bboard/admin-q-and-a-fetch-msg?msg_id=
table_name
reference the Oracle data dictionary view
user_tables
but this doesn't seem to work.
To build admin pages for inserting, updating, and deleting data in the
site_wide_category_map
table, you can use the Tcl procs:
ad_categorization_widget
ad_categorize_row
ad_categorization_widget
ad_categorization_widget
within an HTML form to
create a user interface for categorizing a specific row in the
database. It returns a selection widget that contains options for each
category; already mapped categories are pre-selected, and, for each
category, the category_type
(if one exists) is included
in parentheses. If a category hierarchy is defined, then indentation
makes it visually apparent.
Suppose that you're running a restaurant-rating site like ZAGAT.COM and you want to categorize
restaurants geographically. The ad_categorization_widget
for a restaurant with multiple locations like Nobuyuki Matsuhisa's
excellent Nobu
would look something like this:
ad_categorization_widget
takes the following parameters:
which_table
and what_id
: together,
these two parameters identify the specific row in the database
name
: value for the name
attribute of
the <select>
tag; "category_id_list" by default
multiple_p
: if true (which is the default), then the
widget will allow multiple selections
size
: the number of visible rows in the widget; if
omitted, then the size of the widget will be the number of categories
ad_categorization_widget
is declared with
ad_proc
so you must call it with named arguments, e.g.:
ad_categorization_widget -which_table "restaurants" -what_id 123
ad_categorize_row
which_table
and what_id
parameters) and the
list of categories to which that row should be mapped (the
category_id_list
parameter),
ad_categorize_row
will update
site_wide_category_map
appropriately, inserting any new
mappings and deleting any previously existing mappings to categories
that are not in category_id_list
. Thus, if
category_id_list
is empty, then all mappings to the row
will be deleted.
In addition to which_table
, what_id
, and
category_id_list
, ad_categorize_row
also
takes the following parameters:
one_line_item_desc
: see the description of the
corresponding column in site_wide_category_map
above
mapping_weight
: see the description of the
corresponding column in site_wide_category_map
above;
optional
mapping_comment
: see the description of the
corresponding column in site_wide_category_map
above;
optional
ad_categorization_widget
, you must call
ad_categorize_row
with named arguments, e.g.:
Note thatset category_id_list [util_GetCheckboxValues [ns_conn form] "category_id_list"] if { $category_id_list != 0 } { ad_categorize_row -which_table "restaurants" -what_id 123 \ -category_id_list $category_id_list -one_line_item_desc "Nobu" }
ad_categorize_row
will use the same
mapping_weight
(as well as
one_line_item_desc
and mapping_comment
) for
all the categories in category_id_list
, so you will need
to call it multiple times if you have differently weighted mappings.
For instance, suppose you want to categorize a travel article about
surfing near Santa Cruz with a mapping_weight
of 10 for
"Surfing" (because its great pictures and stories will be very
interesting to all surfers, even if the reader is nowhere near
California) but a mapping_weight
of 2 for "Santa Cruz,
CA" (because the average tourist will not find it interesting, not to
mention the fact that it's only near Santa Cruz, not
in Santa Cruz). You will have to call
ad_categorize_row
twice:
ad_categorize_row -which_table "travel_articles" -what_id 456 \ -category_id_list surfing_category_id -mapping_weight 10 \ -one_line_item_desc "Surfin' Near Santa Cruz" ad_categorize_row -which_table "travel_articles" -what_id 456 \ -category_id_list Santa_Cruz_CA_category_id -mapping_weight 2 \ -one_line_item_desc "Surfin' Near Santa Cruz"
As the comment above notes, the default behavior when a user signs up for a mailing list or whatever is to assign an interest level of 5. This means "moderately interested". If we ever find out that a user doesn't like something, we can record that with a negativecreate table users_interests ( user_id integer not null references users, category_id integer not null references categories, -- 0 is same as NULL; -10 is "hate this kind of stuff"; -- 5 is "said I liked it", 10 is "love this kind of stuff" interest_level integer default 5 check(interest_level between -10 and 10), interest_date date, unique(user_id, category_id) );
interest_level
.
Combining keyword and categorical searches:select one_line_item_description, user_url_stub, section_name from site_wide_category_map, categories, users_interests, table_acs_properties where users_interests.user_id = $user_id and categories.category_id = users_interests.category_id and site_wide_category_map.category_id = categories.category_id and site_wide_category_map.table_name = table_acs_properties.table_name group by one_line_item_description, user_url_stub, section_name order by (sum(categories.profiling_weight * users_interests.interest_level)) desc
select the_key, sum(search_score)+sum(category_score) as total_score, section_name, user_url_stub, one_line_description from (select site_wide_index.one_line_description, site_wide_index.the_key, site_wide_index.table_name, score(10) as search_score, 0 as category_score from site_wide_index where (contains(indexed_stuff,'about($user_entered_query)',10) > 0) union select site_wide_category_map.one_line_item_desc, on_what_id, on_which_table, 0 as search_score, sum(profiling_weight) as category_score from site_wide_category_map, categories where site_wide_category_map.category_id in ('[join $category_id_list "','"]') and site_wide_category_map.category_id = categories.category_id group by on_what_id, one_line_item_desc, on_which_table ) site_wide, table_acs_properties where site_wide.table_name = table_acs_properties.table_name group by the_key, one_line_description, section_name, user_url_stub order by total_score desc