Dynamic Web Presentation with Java and XSLT (ArsDigita Systems Journal)

Dynamic Web Presentation with Java and XSLT

by Bill Schneider (bschneid@arsdigita.com)

Submitted on: 2001-06-05
Last updated: 2001-06-05

ArsDigita : ArsDigita Systems Journal : One article

A brief introduction to XSLT

XSLT ("eXtensible Stylesheet Language Transformations") is a powerful, comprehensive language for performing transformations on XML. It is an increasingly popular technology for content-presentation separation in dynamic web pages; the content for a web page is structured into an XML document, and XSLT is applied to this document to produce an XML output for display in an end-user device. This XML output can be XHTML, WML, VXML, or any other XML-based format.

Since there are standard Java™ APIs for XML parsing and transformations, an application that uses XSLT can be written independently of the specific XSLT engine used. The XSLT engine can then be swapped later for one from a different vendor, without modifying the application.

XAlphabetSoup: Making sense of the standards

XML (eXtensible Markup Language) is a standard format for representing tree-structured data in textual format. XML documents look a lot like HTML (HyperText Markup Language) at first because HTML and XML are both based on SGML (Standard Generalized Markup Language). There are a few important differences between XML and HTML, though:

You can define your own tags in XML. You also get to define a document type (DTD, or "Document Type Definition") which declares valid tags, valid attributes for those tags, and which tags can be nested in which other tags. Most XML parsers can validate the input XML against the DTD.
XML documents have to have a single root element ("document" in the above example), which contains all other tags in your docuemnt. In a well-formed HTML or XHTML document, the <html> tag is the root element. Most web browsers will tolerate HTML that is not enclosed in a single <html> tag, though.
All XML tags must have a closing tag, while unclosed tags like <p> and <img> abound in HTML.

DOM (Document Object Model) is an API (available in both C++ and Java, though this document focuses on Java) for parsing XML and manipulating an XML document as a tree structure in memory. The heart of the DOM is the Node object, which represents an element or attribute; a Document is the top-level object that represents an entire XML document.
SAX (Simple API for XML) is an event-driven API for parsing XML; a SAX parser generates callbacks to the programmer-specified DocumentHandler when elements or character blocks start or end. DOM parsers are often implemented using a SAX parser, building the DOM tree in memory in response to SAX callbacks.
XHTML is a reformulation of HTML 4.0 as an XML standard; it is essentially HTML with stricter syntax rules, like mandatory closing tags. This ensures that XHTML documents can be viewed in Web browsers like normal HTML, but can also be parsed as XML documents.
XSLT (eXtensible Stylesheet Language Transformations) is a language for specifying rules to transform an XML document into some other kind of output. It is most often used for rendering XML-formatted data into some human-readable format (XHTML, WML, etc.)
XPath is the expression language inside XSLT for addressing parts of an XML document, with file system directory-tree like syntax.
XSL (eXtensible Stylesheet Language) is a combination of XSLT plus a set of formatting objects. Confusingly, XSL is actually a superset of XSLT. One XSL-based application is DocBook, which uses XSLT to transform a document that uses XML markup to indicate logical sections, into either HTML or printer-friendly PDF. The formatting objects come into play with PDF generation; the XML document is first converted into another, intermediate XML document containing formatting-object elements; this intermediate document is then processed into PDF.
JAXP (Java API for XML Parsing) is an abstraction layer that makes it easier to write code that uses an XML parser in a vendor-neutral way. Despite its name, JAXP is not an API for parsing XML. Instead, JAXP provides a way to get to an XML parser (DOM, JDOM, SAX).
TrAX (Transformation API for XML) is a Sun-supported Java standard API for transforming XML documents into other XML documents. This standard API makes it possible to write applications that use XSLT without writing to a specific vendor's XSLT engine. TrAX is now part of JAXP 1.1.

Other systems for content-presentation separation in Web applications

Since XML and XSLT are just another one of many mechanisms that can be used for decoupling dynamic content from its HTML presentation in a Web application, it is worth comparing it to other systems that strive to provide the same separation. The other systems discussed here excel at either providing computational power to presentation logic, or maintaining a strong wall of content-presentation separation; XSLT accomplishes both. Also, XSLT surpasses other presentation systems where it is desirable to easily change the presentation of page components on a site-wide basis, to accomodate a co-branded subsite or a new output type like WML.

JSP

In the early days of server-side Web programming, people would write CGI scripts to directly generate HTML output as string literals. Later, Java servlets followed this model. People soon grew tired of generating complex HTML literals in servlet code with lots of out.println() statements, and it didn't take long for programmers to realize that it was difficult for graphic designers to modify the color scheme, layout, etc. of web pages because an edit to the HTML required editing and recompiling Java code.

As a result, Sun introduced Java Server Pages (JSPs), which let you mix Java code and write HTML output as HTML, interpolating values from Java variables or method calls. JSP is functionally similar to Microsoft Active Server Pages (ASP), though ASP predates JSP, and they are based on a different set of technologies. JSPs are translated to servlets on demand, and execute as such. Both make it possible to write server-side programs that look mostly like regular HTML ("hello world" is a legal JSP) but with dynamic value computation and substitution.

JSP is susceptible to writing confusing pages that inextricably mix business logic and presentation logic, so beans and JSP custom tags were introduced to facilitate moving code out of JSPs and into Java classes that could be reused over many JSPs. (Similar mechanisms for moving reusable code out of pages and into objects exist in Microsoft ASP.) The resulting design pattern is known as "Model 1." The later "Model 2" pattern is similar, except that the incoming request is handled instead by a servlet that maintains common application state across all pages (user authentication, authorization, etc.), and then dispatches to a JSP.

There are two major pitfalls with JSP. First, with the full power of Java available inside a JSP, there is no way to enforce the Model 1 or Model 2 convention of confining application logic to Java classes outside the JSP. Second, modifying even simple presentation logic inside a JSP (e.g., "if status code is 'failure', color it red") often requires significant understanding of Java's type system. For example, objects retrieved from generic containers are usually just Objects and need to be type-casted to their actual (runtime) types before they can be used in comparisons; also, primitives like ints and booleans often have to be unwrapped from their object counterparts Integer and Boolean.

Simplified web presentation systems: Velocity, FreeMarker, etc.

As a result of JSP's shortcomings, there are several JSP-like presentation systems, like the open-source Velocity (http://jakarta.apache.org/velocity) and FreeMarker (http://freemarker.sourceforge.net), for building dynamic web presentation in Java. These systems provide a simplified templating language, which is just HTML plus some scripting elements for performing limited display logic: looping over multi-row table data, substituting dynamic values, etc. An environment containing name-value bindings is filled prior to the output-generation stage. This hides a lot of the details of Java expressions and type casting from template editors, but the types of expressions and computations that can be performed may be limited.

These systems use an architecture similar to Model 2; a request dispatcher servlet responds to an incoming request by dispatching to another servlet which performs application logic, and fills a data structure (usually a java.util.Map or some variant) with a set of name-value mappings. Then the request is dispatched to the templating engine, which processes the template using the dynamic value map for variable subsitutions and logic.

The major disadvantage to this type of presentation system is the limitations of the scripting language. To allow for HTML writers to include conditional presentation logic (e.g., "alternate table background color with every other row"), some sort of expression syntax is necessary. But there is a constant tension between providing expressive power and keeping things simple; this type of presentation system is not intended to replace a full-featured programming language like Java, and so the expressive power of such systems are limited. For example, while it may be possible to alternate table row colors for even/odd numbered rows, it may not be possible to alter color using a more complex arithmetic expression computed in the template itself.

Another shortcoming of presentation systems like Velocity is the difficulty of decomposing a page's presentation into components, to maintain a consistent look-and-feel for a component across an entire site. Usually this style of presentation system has a mechanism for including a template inside another template, which can be used to make templated components; but then the template for the component must be explicitly included inside all pages on which the component appears.

Using XSLT in a Web system
On a different axis, XML gained popularity as a standard for representing the data in enterprise systems to deal with the limitations of HTML. HTML is a mismash of tags to represent logical units of a document, like <p>, <h2>, and <li>; and tags that denote pure markup, like <em> and <strong>. There is no way to denote logical units of a document in HTML without implying a certain look-and-feel in a web browser, though; more on this can be found in Chapter 5 of Philip and Alex's Guide to Web Publishing.
If an enterprise system is already using XML documents to represent data for interchange with other systems, then XSLT allows you to use the same XML document for dynamic HTML generation. Once the content you want displayed on a web page is specified by its logical structure as an XML document, you transform that document with XSLT into some other markup that can be displayed on the user's device. This architecture allows you to generate the same structured data as an XML document, using interchangeable XSLT rules to generate different output types depending on whether you ultimately want XHTML, WML, VoiceXML, etc. One usage pattern for incorporating XSLT as a templating system for Java-based Web applications is described in the following diagram:

A servlet performs application-wide logic (user authentication, authorization, etc.) and dispatches the request to the appropriate business logic for the request.
The request-specific logic computes dynamic content for display, and structures it as an XML document. This XML document will usually be produced as a DOM Document object, which can then be transformed directly by XSLT without being parsed first.
The request handler chooses an XSLT stylesheet to use for rendering the content and calls the XSLT engine to render the XML produced above into user-ready output (XHTML, WML, etc.)
The output is served to the user.

Advantages of XSLT
XSLT has several advantages over other Web templating systems:

The representation of the structured data is standard, and the same XML data passed to XSLT for transformation into XHTML can also be used to represent data for interchange with other systems, such as B2B exchanges.
XSLT is a rules-based language, so the generated XHTML markup is specified non-positionally. This means it's possible to design a single stylesheet that contains all of the XHTML that will be generated for an application, but output will only be generated from rules that are triggered by the input XML data. This makes it easier to maintain a consistent site-wide style.
The rules-based nature of XSLT encourages the re-use of templates for page components that appear on many pages. In a templating system like Velocity, if you have a page component (e.g., five most recent press releases) that appears on many pages, and desire a consistent markup for the component, you need to decouple the template for the component into a separate template and explicitly include it in all templates the component appears.
Because XSLT encourages the consolidation of markup in reusable template rules, it is easier to swap out stylesheets to create a different look-and-feel for a co-branding, or for a different locale or content type (WML, XHTML, printer-friendly XHTML, XML, etc.)
XSLT stylesheets are XML documents, which makes it feasible to manipulate them programatically to produce new stylesheets on the fly.
The rules-based specification of template rules allows for easier upgradability of a web toolkit's default templates without losing customizations. You can customize the template rule for an embedded component without changing anything in the default higher-level templates; xsl:apply-templates is driven by the XML input, and does not necessarily need the name of a specific template to apply.
XSLT is fully Turing-complete; any computable function of the input XML can be performed in XSLT.
XSLT is an open standard, and there are several open-source XSLT engines available for Java such as Xalan and Saxon.

Disadvantages of using XSLT--and overcoming them

Usability
The most immediately apparent disadvantage to using XSLT is its complexity. It's significantly harder for both programmers and non-programmers to learn XSLT than a gentle-slope, HTML-with-substitutions systems like Velocity. The rules-based evaluation may be unfamiliar to many programmers at first, and the syntax is much more complicated.
However, it is possible to use XSLT in a simplified way (see http://www.xfront.com/rescuing-xslt.html); using XSLT as a per-page templating system like Velocity sacrifices some of the site-wide styling power of XSLT but makes for a good introduction to XSLT.
Compare the following two code examples:
Velocity XSLT
<HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome $name! <br /> List items: <ul> #foreach ( $item in $items ) <li>Type: $item.type, Quantity: $item.quantity #end </BODY> </HTML>
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/doc"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome <xsl:value-of select="name"/>! <br /> List items: <ul> <xsl:for-each select="items"> <li>Type: <xsl:value-of select="@type"/>, Quantity: <xsl:value-of select="@quantity"/>, </xsl:for-each> </BODY> </HTML> </xsl:template> </xsl:stylesheet>
Although the Velocity syntax is more compact than the XSLT syntax, they say the same thing essentially the same way. The sections of generated HTML with dynamically interpolated data are nearly identical in structure, though the XSLT contains some additional surrounding code, indicated in gray text.
Also, the main difference is that in the XSLT version, you can decouple the rule for rendering an individual list item into a separate template rule and invoke it with <xsl:apply-templates>; this makes it easier to change the styling for a list item of this particular type independently of the rest of the page. You can do something similar with the #include directive in Velocity but you have to explicitly name the template to include, which couples the presentation of a list item component more to the presentation of the page as a whole.
Performance

Another potential disadvatange of XSLT is performance. Because an XML document is a more elaborate data structure than a simple name-value mapping, it necessarily will take more memory. And, since stylesheets themselves are XML documents, they need to be parsed as such, which takes time; or have their parse trees stored in memory, where they could take more space than a simple HTML-with-substitution template.

Also, because the final output is specified non-positionally in XSLT, computing the final XHTML output can take longer to compute than a simpler templating system, where the order of literal HTML text in the template is the same as the order of literal text in the output.

There's much that can be done to mitigate concerns about performance and scalability of an XSLT-based presentation system, though.

Because XSLT is an open standard, we can always swap out the XSLT engine in an ACS web application for a faster one without changing any of our own code. For high-performance commercial applications, we could use a faster, optimized commercial engine. For example, although it probably doesn't work nicely with Java right now, it's worth noting that one of the best-performing XSLT translation engines is made by Microsoft.
XSLT stylesheet parse trees can be cached in memory, to lessen the cost from parsing stylesheets as XML documents. The cost associated with parsing is significant compared to the cost of the transformation itself.
The TrAX API allows us to transform XML documents using not a stylesheet directly, but a Templates abstraction around it. This allows for representing an XSLT transformer not as an XML document, but as some special-case, optimized object that takes less memory than an XML document. This also raises the possibility of a JSP-style system in which stylesheets would be compiled on demand to Java classes cached in memory, and stored in filesystem for persistency across restarts.

Page scripting

A page script is something like a JSP or ASP, where the logic performed on a URL request is defined inside a file in the file system, and the logic is mapped to a particular URL by the virtue of its pathname. The most important characteristic of page scripting is that changes to the file, as well as additions of new files or removals of old files, take effect immediately when the URL is next requested, with no additional explicit action (e.g., recompilation) required.
Although XSLT templates are not necessarily written on a per-URL basis, but are instead written on a per-element basis, they are still useful in page scripting environments. If we have a page.jsp page script and a page.xsl stylesheet, the template rules contained in page.xsl stylesheet can be combined transparently with global styling rules, so that page.xsl only must contain a set of per-URL overrides.
How ArsDigita uses XSLT
We use XSLT in all versions of ACS starting with version 4.5. We chose to use it because of its strengths in building a component-based system and for global styling. XSLT makes it easy for us to make the presentation for page components swappable on a global basis, and to maintain consistent styling of components on all pages where they appear.
Any ACS based-system is organized as a series of applications; each application type can have one or more instances which are used to scope the content for a particular applicatation. This allows us to have a single news application, for example, but with different isolate sets of news articles displayed through different URLs. Application instances are mounted on site nodes, which are just URL stubs in the site-wide URL tree. So when you request a URL, that URL is within a particular site node, which is associated with a package instance, which is associated with an application type (the code base to use) and a unique identifier for isolating content within that package.
The requested URL resolves further to perform a specific action within the package and display a response page to the user. Each component in the response page generates a piece of an XML document. XSLT renders the complete document, with contributions from all page components, into the proper output format for the user's client (usually XHTML) by XSLT.
We associate XSLT stylesheets with package types, so any given ACS application defines a stylesheet with all the necessary rules for rendering the XML it generates. We also associate XSLT stylesheets with site nodes, so that for any particular subsite, we can override the default rules for the package. This is useful for co-branding and other localized look-and-feel changes.
The effective stylesheet for a request is dynamically composed from a series of stylesheets; the composed effective stylesheet can then be cached for future requests. When a request comes in for a particular site node, we build a stylesheet using the stylesheet associated with that site node and all of its ancestors in the site map. Then we incorporate the stylesheets for all relevant package defaults. (Remember, these rules don't generate any output unless some piece of the generated content triggers them.) The site-node specific rules will take precedence over the package defaults.
XSLT code example: global styling and embeddable components
Suppose we have a simple News application that displays a list of press releases. We would represent a list of press releases as an XML document:
<news:press-release-list>
  <news:press-release href="one-article?id=2837" title="ArsDigita uses XSLT"/>
  <news:press-release href="one-article?id=2838" title="Apache releases open-source XSLT engine"/>
  <news:press-release href="one-article?id=2839" title="Sun adopts new Java APIs for XML"/>
</news:press-release-list>
... and we would convert an XML list of press releases into HTML using the following XSLT stylesheet:
<xsl:stylesheet version="1.0" 
                   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="news:press-release-list">
  Press releases
  <ul>
  <xsl:for-each select="news:press-release">
    <li>
      <a>
        <xsl:attribute name="href">
          <xsl:value-of select="@href"/>
        </xsl:attribute>
      <xsl:value-of select="title"/>
      </a>
    </li>
  </xsl:for-each>
</xsl:template>

<-- more XSL template rules -->

</xsl:stylesheet>
Within ACS, we would associate this stylesheet with the News application, as its default stylesheet. Now, any page in the News application will display a list of press releases with the same, consistent look-and-feel.
Now, if we want to embed the list of press releases as a component inside a page in a different application, we don't have to touch the XSLT stylesheets or their assocations. ACS always makes all packages' registered default stylesheets available to all applications, all the time. If the XML document generated for a page request contains a <news:press-release-list> element, the list of press releases will be included in the HTML output looking the same as if it were generated from a page in the News application itself.

Let's turn to co-branded subsites now. Suppose we have two URLs on our web site, /news/index and /cobrand/news/index, each of which displays the same list of press releases; but the URL under /cobrand should display the list with a different stylesheet specific to the co-brand:
<xsl:stylesheet version="1.0" 
                   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="news:press-release-list">
  MySite.com Press Releases 
  <table>
  <xsl:for-each select="news:press-release">
    <tr>
      <td><img src="my-site-icon.gif" /></td>
      <td><a>
        <xsl:attribute name="href">
          <xsl:value-of select="@href"/>
        </xsl:attribute> 
        <xsl:value-of select="title"/>
      </a></td>
    </tr>
  </xsl:for-each>
  </table>
</xsl:template>

<-- more XSL template rules -->

</xsl:stylesheet>
This stylesheet displays press releases as table rows instead of list items and displays a specific graphic with each row.
To make this stylesheet take effect for the /cobrand/news/index URL, we register the stylesheet with the /cobrand site node in the site map. When we request /cobrand/news/index, the XSLT stylesheet used for the final request processing is composed of the stylesheets associated with:

the /cobrand site node
the / (main site) site node
the default News package stylesheet
... in that order of precedence.
The new stylesheet for press releases is in effect throughout the entire co-branded subsite. So if a press-release list is embedded inside a page outside of the News application, the co-branded stylesheet still applies.

The process for swapping stylesheet rules based on locale or output type (e.g., WML instead of HTML) is similar.
Nothing in the above example couldn't be done with some other web presentation system like JSP or Velocity. XSLT is more powerful, though, for a number of reasons:

Template rules are supplied on a component-by-component basis (where a component is represented as a subtree in an XML document), so it is clear how a stylesheet for a co-branded subsite can override the default presentation for some components while retaining the default presentation for others.
If the default template rules change (as they could in an ACS upgrade), then the new default template rules can be incorporated alongside existing customized template rules easily.
XSLT stylesheets associated with a subsite, locale, our output type for a particular component can also override the presentation for the parts of the page outside their component's output. For example, the co-branded News stylesheet could also include a template match="/" rule to change the way the top-level page layout looks.

Conclusions
XSLT is promising for web presentation because it lends itself well to web applications built from components that must have a consistent appearance on a wide variety of pages where they may appear. There are many advantages to using XSLT for web applications, such as co-branded, localized portals.
On the other hand, XSLT can be difficult to learn and, while early results at ArsDigita have been promising, there are still some lingering concerns about performance and usability. Although XSLT may make it more difficult to build a simple web application's presentation layer, it makes the presentation layer for a complex web application not much more difficult than that for a simple application. Using XML for an intermediate data-representation stage is also promising because it opens doors for interoperability with other enterprise systems.

Links to more resources

W3C documents

XSLT (http://www.w3.org/TR/xslt)
XPath (http://www.w3.org/TR/xpath)
XHTML 1.0 (http://www.w3.org/TR/xhtml1)
Canonical XML 1.0 (http://www.w3.org/TR/xml-c14n)

Open-source XSLT engines for Java

Saxon (http://users.iclway.co.uk/mhkay/saxon)
Xalan (http://xml.apache.org/xalan-j)
Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries

asj-editors@arsdigita.com

Reader's Comments

We used XSLT for an application and even if it was flexible, we needed real programmers to twist and turn the templates, especially when reviewing the site with customers and trying to obtain real time changes to get the site with the correct look and feel -- don't ask this to a designer...
Performance was also abysmal (well acceptable for moderate number of users) (with XALAN) - even when using caching of results. Slightly better with SAXON or XALAN Transformers we tried later. Nevertheless, slow as hell when we contrast it with an ASP application or a template engine like Velocity (which can be integrated to work with DreamWeaver for example).
This all depends on the style of the shop. More IT-oriented and XSLT is very seducing, more design-oriented and XSLT is a nightmare.
A point of note: you need to be able to understand the templates after the project has been released for maintenance tasks... not always that easy given the 'super-neat' syntax of XSL.
Anyway the code was 1000 times cleaner than with a JSP application.
A thing I see is better in using Velocity templates than XSL ones is that in Velocity you can pass complete Java objects directly to the template, avoiding the pain of putting the data into an intermediate XML structure which gets massaged by the XSLT processor. Even more, you can call 'set' methods from the template and get the results back into the java objects with no fuss -- try this with XSLT ! (You can go even further since WebMacro uses introspection [you drop your session object in the context and all session objects are available - how cool!]
The best approach I can think of is to combine the two:
Use Velocity-like templates for containers and easy components and create a #XSLT-process directive in the template language for embedding XML elements. I think WebMacro does it so you may be able to pick the directive from WebMacro and plug it into your Velocity codebase. FreeMaker may have it, I don't know.
This will allow for ease of use for the designers for 80% of the design and require a developer for the 20% of complex stuff (compare this to tying a programmer to tasks he doesn't excel at [graphics and looks] for 100% of his time). This is peopleware but this is how things move forward.
Another issue we had was that XSL has no real variables, you bind a value to a variable, it's for its scope only (a page). So, in a for each, you cannot have a variable incremented (like do a sum in a for - okay it can be done with a xpath and some aspirins). So, what we did is to output Javascript which in turn computed stuff. Not clean! Don't tell me that it would be possible with a recursive template - try to explain this to a designer!
xsl-script was in a W3C draft (like Microsoft implemented) was useful for the above but is not there in XALAN and current spec.
Comments welcome!

-- Philippe Back, August 14, 2001

If you desire faster XML / XSLT performance then grab Resin http://www.caucho.com which includes an XML parser and XSLT transformer that are the fastest I know of. All of resin does much more than this (it's a servlet engine) but you can find docs that explain how to use its XML assets stand-alone. The source code for Resin is available and the product only costs money ($500) in certain situations.

-- David Smiley, February 4, 2002