Localization
Part of an article on Building a Multilingual Web Service Using the ACS, by John Lowry (lowry@arsdigita.com)
ArsDigita : ArsDigita Systems Journal : One article
Localization
Locales are the set of language and cultural rules which are used to
format dates, numbers, monetary amounts and other information. Where
appropriate we need to localize the information that is displayed on
the web site. In addition, we need to be able to accept input in web
forms that is in a localized format.
Localizing software is a well-documented problem. The ISO
14652 standard documents the various areas of a computer program
that need localization. In fact, the software on top of which ACS
runs, Oracle and Unix, already makes available an API to display
various information in a locale-specific way. However, neither of
these APIs can be conveniently called from within AOLserver, so we
have written an API which provides procedures for dealing with
reading and writing localized data.
Each procedure provided by our localization API needs to take at least
the following two types of input:
- The data that needs to be converted to or from a localized
format. What data are we going to be using on
a web site that requires localization? Here are the areas that we
consider most important:
- Numbers
Locale differences that affect numbers include the decimal separator
and thousands separator.
- Monetary amounts
Monetary amounts can have different locale rules from numbers. For
example, the number of digits allowable after the decimal point will
depend on the currency.
- Dates
Dates can be represented with different formatting strings. For
example, 7/4/2000 and July 4 2000 are two examples of
formatted dates, which are both localized for the en_US locale. We need to
be able to localize each formatting string.
Other types of data can be localized, such as address formats, phone numbers
and weights and measures. In the case of
measures, it would be necessary to store the measures in a canonical
form and convert to and from a localized form, such as meters or feet
when doing input or output. It would be possible to extend our
localization API to cope with these other types of data.
- The locale that will be used for the conversion. We can
represent the locale as three part string that includes the language,
country and dialect in the following format:
language_country_dialect
For a web site, however, it is usually sufficient to specify locales
without a dialect. We are unlikely to want to distinguish between, for
example, Scottish English and English English in
displaying data on a site. Therefore, we have specified a localization
API that represents a locale using just the language and country.
The language code is defined by the ISO
639 standard and the country is defined by the ISO 3166
standard. Examples of locales that we use include en_US
(United States English) and fr_FR (French as spoken in France).
The Localization API
The Tcl localization API includes a host of helper procedures but the
only ones that need be called are shown below:
Procedures for converting to and from localized versions of numbers:
- lc_numeric num fmt locale
- This procedure returns a number in a localized
format. It takes as input the following parameters:
- num is a canonical number.
- fmt is a format string used by the Tcl format
command. In most cases, this parameter should be an empty string.
- locale is the locale abbreviation that will be used to
convert the input number. It defaults to en_US.
- lc_parse_number num locale integer_only_p
- This procedure returns a canonical number, suitable for
manipulating in Tcl or inserting into a database. It takes as input
the following parameters:
- num is a localized number. If num is invalid, the
procedure throws an error.
- locale is the locale that the input number is formatted
with.
- integer_only_p restricts valid numbers to integers only if
this parameter is a true value. It defaults to a false value.
Procedure for displaying monetary amounts:
- lc_monetary_currency -label -style num currency locale
- This procedure returns a locale-specific
monetary amount. It takes as input the following parameters:
- -label is an optional switch. If it is set to a true value,
the return value displays a currency alongside the amount.
- -style is an optional switch. If it is set to a value of
int displays the currency as the ISO currency code. By default
it will display the appropriate HTML entity for the currency.
- num is a canonical number that represents the amount.
- currency is the currency for this monetary amount.
- locale is the locale in which to format the monetary amount.
Procedure for displaying dates:
- lc_time_fmt datetime fmt locale
- This procedure returns a locale-specific date string. It takes as
input the following parameters:
- datetime is a date string in the form YYYY-MM-DD
HH24:MI:SS
- fmt is a formatting string specified by the ISO 14652
standard.
- locale is the locale in which to display the date.
Locale Data
The localization API in Tcl requires a source of
localization data, a list of the decimal separators, thousand
separators and other conventions for each locale.
Much of this data is already
provided by the Linux operating system
It's stored in the
/usr/share/i18n/locales/ directory and accessed through the
localeconv system call.
We used a C program which calls localeconv to dump all the
Linux locale data to a file. From this, we have generated a Tcl
library which gets loaded at startup by AOLserver. The locale data
gets stored in a variable that is accessible to all Tcl interpreters
in the server, so that it can be used by the localization API.
If you wanted to support a
locale with the API which was not supported by Linux, you would
need to add a section to the library with appropriate values
for the new locale.
nsv_set locale en_US,abday {{Sun} {Mon} {Tue} {Wed} {Thu} {Fri} {Sat}}
nsv_set locale en_US,abmon {{Jan} {Feb} {Mar} {Apr} {May} {Jun} {Jul} {Aug} {Sep} {Oct} {Nov} {Dec}}
nsv_set locale en_US,am_str "AM"
nsv_set locale en_US,currency_symbol "$"
nsv_set locale en_US,day {{Sunday} {Monday} {Tuesday} {Wednesday} {Thursday} {Friday} {Saturday}}
nsv_set locale en_US,decimal_point "."
nsv_set locale en_US,d_fmt "%m/%d/%y"
nsv_set locale en_US,d_t_fmt "%a %B %d, %Y %r %Z"
nsv_set locale en_US,frac_digits 2
nsv_set locale en_US,grouping {3 3 0}
nsv_set locale en_US,int_curr_symbol "USD "
nsv_set locale en_US,int_frac_digits 2
nsv_set locale en_US,mon_decimal_point "."
nsv_set locale en_US,mon_grouping {3 3 0}
nsv_set locale en_US,mon {{January} {February} {March} {April} {May} {June} {July} {August} {September} {October} {November} {December}}
nsv_set locale en_US,mon_thousands_sep ","
nsv_set locale en_US,n_cs_precedes 1
nsv_set locale en_US,negative_sign "-"
nsv_set locale en_US,n_sep_by_space 0
nsv_set locale en_US,n_sign_posn 1
nsv_set locale en_US,p_cs_precedes 1
nsv_set locale en_US,pm_str "PM"
nsv_set locale en_US,positive_sign ""
nsv_set locale en_US,p_sep_by_space 0
nsv_set locale en_US,p_sign_posn 1
nsv_set locale en_US,t_fmt_ampm "%I:%M:%S %p"
nsv_set locale en_US,t_fmt "%r"
nsv_set locale en_US,thousands_sep ","
Data input
Lets look at how users input data in a web form. Each form is made up
of a number of input fields such as the one below:
We need to process the values submitted by the user with the correct
procedure from the localization API to convert the localized input
into a canonical number that can be manipulated in Tcl or inserted
into the Oracle database. In practice, the only type of input that
needs to be modified is numbers. We need to use ad_parse_number
to get the canonical representation of the user's input in the above
form.
The templating module
that is supplied as part of the ACS is particularly suitable for use
with multilingual web sites because it allows cacheing of translated
forms. A programmer using the templating module specifies the form elements
for a web page using an XML-like syntax, rather than
coding the HTML by hand. Each form element has a number of properties,
but the one that we care about for the purpose of localization is its
datatype. We need to provide localized versions of templating
system datatypes so that the data entry widgets can correctly handle
input from different locales.
Here is a form specification for a text entry box which
uses the lc_number datatype, that accepts input of localized
numbers:
<element status="optional" width=10>
<name>price</name>
<label>Price</label>
<widget>text</widget>
<datatype>lc_number</datatype>
<datamap>
<table>prices</table>
</datamap>
</element>
The datatype of the text widget is set to lc_number, which
automatically passes the input through the lc_parse_number
procedure and attempts to validate it as a number in the user's
locale. Thus a French user could enter 123,45
and the
templating system would automatically convert this to 123.45
before it
got stored as a number in the database.
Source code examples
We can now look at how a programmer uses the localized API to display
localized data on a web page. Here is an example of a Tcl script that
does this:
ns_return 200 text/html "
<table cellpadding=3>
<th><td>en_US</td>
<td>en_FR</td></th>
<tr><td>Displaying a number</td>
<td>[lc_numeric 123456.789 {} en_US]</td>
<td>[lc_numeric 123456.789 {} fr_FR]</td></tr>
<tr><td>Parsing a number</td>
<td>[lc_parse_number 123,456.789 en_US]</td>
<td>[lc_parse_number "123 456,789" fr_FR]</td></tr>
<tr><td rowspan=2 valign=top>Displaying a monetary amount</td>
<td>[lc_monetary_currency -label 1 -style local 123.4 USD en_US]</td>
<td>[lc_monetary_currency -label 1 -style local 123.4 USD fr_FR]</td></tr>
<tr><td>[lc_monetary_currency -label 1 -style local 1234 FRF en_US]</td>
<td>[lc_monetary_currency -label 1 -style local 1234 FRF fr_FR]</td></tr>
<tr><td>Displaying a date</td>
<td>[lc_time_fmt "2000-07-24 14:22:34" "%c" en_US]</td>
<td>[lc_time_fmt "2000-07-24 14:22:34" "%c" fr_FR]</td></tr>
</table>
"
When this script is run from AOLserver, we get the following
information displayed in a web browser:
| en_US |
en_FR |
Displaying a number |
123,456.789 |
123456,789 |
Parsing a number |
123456.789 |
123456.789 |
Displaying a monetary amount |
$123.40 |
123,40 $ |
FFr1,234.00 |
1 234,00 FFr |
Displaying a date |
Mon July 24, 2000 02:22 PM |
lun 24 juillet 2000 14:22 |
Now lets look at how a programmer would use the ACS templating module
to display localized data. The programmer specifies data sources for all
information that gets displayed in a web page. The programmer needs to
ensure that each datasource has been localized where appropriate. For
example, here is a data source that displays a localized date.
<datasource>
<name>sysdate</name>
<type>eval</type>
<structure>onevalue</structure>
<condition>
lc_time_fmt [ad_dbquery onevalue "select sysdate from dual"] "%c" $user_locale
</condition>
<comment>
The current date.
</comment>
</datasource>
The
second argument to lc_time_fmt, %c, is the ISO 14652 formatting
string which displays an appropriate date and time
representation. A user in the fr_FR locale, for example, sees the
following displayed when this datasource in included in a web page
with the locale data that we use.
mar 04 juillet 2000 16:30
All numbers and monetary amounts must also be localized with the
appropriate localization API procedures within the
data source.
More information
Oracle localization information http://oradoc.photo.net/ora816/server.816/a76966/toc.htm
Linux locale man page http://man.he.net/man7/locale
Localization support in internet mail http://www.terena.nl/multiling/ml-mua/mldoc-review.html
ISO 14652 standard for software localization http://anubis.dkuug.dk/JTC1/SC22/WG20/docs/14652fcd.txt
ISO 3166 standard for country codes http://wmbr.mit.edu/stations/ISOcodes.html
ISO 639 standard for language names http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_639.html
asj-editors@arsdigita.com