|
SYS-CON.TV Webcasts
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Top Links You Must Click On
General Java Spread Your Wings with Cocoon
Spread Your Wings with Cocoon
Jun. 1, 2002 12:00 AM
Apache Cocoon is one of the most interesting, innovative, and powerful platforms for dynamic content generation, though not as well known as the others. A subproject of the Apache XML project, Cocoon is one of the lesser-known offerings from the folks at the all-open-source Apache Software Foundation, having garnered less attention than some of its more popular cousins like Struts. But Cocoon is worth a look. It's not just Cocoon's use of XML in content generation that makes it so interesting; it's how it uses XML. Cocoon's authors clearly have a deep experience with and an understanding of XML - what it is and isn't good for - and Cocoon's simple but powerful architecture reflects that experience. XML isn't used here just "because everyone's using it." Rather, Cocoon exploits XML's strength for separating content from presentation. (As we know now, the lack of that separation made it increasingly difficult to do Web page development in straight HTML.) The result is an innovative and powerful tool for content site developers. This article will familiarize you with Cocoon and some of its related technologies: what it is, what it does, and how to start using it in your own development projects. A basic understanding of the core concepts behind XML, SAX, and XSL (and, of course, HTML) is helpful when reading this article. Don't worry too much if you haven't worked with these technologies, though. I won't be delving too deeply into them and will try to make any examples easy to understand.
What Is Cocoon?
Cocoon's core difference is its use of XML throughout the content-generation process. Each request sent to the Cocoon framework is processed using the same three steps:
Another major plus of Cocoon's XML-orientation: it provides for excellent separation of content and presentation - that holy grail of software applications. Content is kept as presentation-free XML data for as long as possible during processing, and then formatted into the appropriate output format just before being returned to the user. In fact, Cocoon strives for an even greater separation of concerns. Its philosophy is to look upon the process of content generation as three separate realms: content, logic, and style. This type of division makes a great deal of sense, especially when you consider that completely different teams of people are frequently assigned to each of these functions: logic to software developers, content to users and data entry staff, and style to graphic designers.
How Cocoon Was Hatched
Cocoon 1.0
As with all early-version software though, it's often difficult to foresee the potential usability problems that will crop up in practice while the software is still being developed. It's also difficult to envision how popular your application will be at such an early stage. Cocoon was no different. Version 1.0, although functional, had its usability hampered by design decisions made early on, most notably its reliance on the memory-intensive XML DOM architecture. (The SAX model, and the APIs and tools needed to use it, were still in their infancy at that point.) Since Cocoon was proving to be quite popular, demands for new features and improved performance kept coming in. It soon became clear that the initial architecture was not adequate to address these issues.
Cocoon 2.0
The first notable improvement is the substitution of the event-driven SAX XML standard for the memory-intensive DOM API. In addition to the improvements in memory efficiency and scalability, the SAX model also allows output to be generated incrementally. This provides a faster response time since a response page is returned little by little, rather than waiting until all processing is complete to return a page (as the DOM model required). The second major improvement concerns the internal architecture of the Cocoon application. Originally structured using a Reactor design pattern, this structure apparently caused conceptual as well as implementation difficulties. Instead, version 2.0 substitutes a pipeline architecture (described later) that proved far more flexible to code as well as much clearer conceptually. The result is a solid, well-tested, powerful, and more efficient framework for just about any type of content generation under the sun. At the same time it manages to elegantly achieve true separation of presentation and content. In short, Cocoon really rocks!
How Does Cocoon Work?
A Servlet at Heart
The Pipeline Architecture
A pipeline at its simplest consists of a sequence of the three core Cocoon components - generators, transformers, and serializers - arranged in a chain (see Figure 1). XML data (SAX events) is passed down the chain, with each component performing its own processing on the data as needed. At the end of the chain the events are serialized out to the response's OutputStream and returned to the client making the request.
Generators, Transformers, and Serializers
The last component in the chain is always the serializer. The serializer's job is to turn the stream of XML events into some form of output that will be returned in the response. Prebuilt serializers are available to create output in the most popular formats: XML, HTML, text, WML, an SVG image, and more. Again, over a dozen varieties of serializers are included with the Cocoon distribution and again you can easily roll your own to support just about any output format you like. As an option, a sequence of one or more transformers can lie in between the generator and the serializer. Transformers allow the developer to manipulate the XML events coming down the pipeline - adding, removing, or modifying events as needed - before the serializer finally sends them back in the response. The XSLT transformer is the most common - and most powerful - transformer. It runs an XSL stylesheet against the stream of XML events coming down the pipeline, allowing the developer to use the powerful XSLT language to transform the XML from pure data into styled output. You can place multiple transformers in a row in the pipeline, each of which will operate on the XML events one at a time. This allows you to style the data incrementally, and can help keep your stylesheets smaller and simpler. Although Cocoon uses several other types of components as well (which are beyond the scope of this article), these three components are the core of its architecture. Pretty simple, huh? But it sure is powerful! By assembling combinations of these core components - along with your own custom-built server pages and stylesheets - you can build pipelines to generate content from any data source you like, styled however you like, and rendered in whatever output format you like.
Putting It All Together
Our "Hello World" pipeline will work as follows:
HelloWorld.xml
<?xml version="1.0"?> The HelloWorld.xml file is very simple, consisting of a single node (<message>) that contains a single attribute (text). The Style.xsl stylesheet (see Listing 1) is also very simple, consisting of only two formatting transformations. The first one (xsl:template match="/") is called when the XSLT processor begins processing the document. It generates the skeleton of an HTML page. The body of the page is left empty, however, except for the XSL instruction xsl:apply-templates. This instruction simply commands the XSL processor to begin processing any child nodes here, applying other templates as needed. The net effect of the instruction then is "transform any child nodes here." In this case there's only one node in the XML file, a <message> node, and only one remaining template in the stylesheet (xsl:template match="message"), which is looking to match <message> nodes. Since the stylesheet's template matches the XML file's node, we'll perform the second transformation:
Once Cocoon executes this pipeline and performs this transformation, the result is the following HTML:
<html> This HTML is then serialized using the HTML serializer (which takes care of the few incompatibilities between strongly formatted XML and loosely formatted HTML) and the whole thing is sent back in the response to end the request.
The Sitemap
Cocoon uses a file called sitemap to define all the pipelines in your application. The sitemap is just that, a map of your Cocoon Web site. It defines which pipeline will be run in response to each site request, and how exactly each pipeline will generate its response page. The sitemap is written in, guess what? XML, just like everything else in Cocoon. Let's look at it piece by piece. First, all sitemaps must contain the <map:sitemap> root element:
<?xml version="1.0"?> Then the sitemap lists which Cocoon components your site will use (see Listing 2). In this case we'll be using only three components:
We'll also have to define an additional component, a matcher, to get this sitemap to work. A matcher is used to match the URL that the user enters and route it to the appropriate pipeline. (We won't discuss matchers in this article though.) Then we define the pipelines used in the site. In this case we have only one, our Hello World pipeline, which we will set up to be executed when a request arrives for page "HelloWorld.html". The pipeline calls the file generator to read from the HelloWorld.xml file, then calls the XSL transformer to apply the Style.xsl stylesheet, and finally calls the HTML serializer to properly format the XML event stream as HTML. Since earlier in the sitemap we defined each of these components to be the default of its type (see Listing 2), we can use a shortcut and not explicitly write which component we're using; Cocoon assumes we're using the default. (However, if we were calling a generator other than the file generator, a JSP page, for example, we would need to write something like <map:generate type="jsp" src="HelloWorld.jsp"/>.) The full pipeline reads like this:
<map:pipelines> Finally, we write a closing tag for the root element: </map:sitemap> And that's it. Our complete Hello World sitemap reads like Listing 3.
Installing and Running Cocoon Installing Cocoon onto the server is pretty easy for most servlet engines and usually consists of the following:
More details and instructions for specific servlet engines can be found on the installation page at the Cocoon Web site: http://xml.apache.org/cocoon/installing/. Once Cocoon has been installed, running it is just a matter of accessing a URL that's handled by the Cocoon servlet. When a request to such a URL is made, it is routed to the Cocoon servlet. Cocoon matches the URL against its sitemap and then executes the appropriate pipeline. To run our Hello World site, we first need to take the sitemap we just wrote and overwrite the sample sitemap.xmap file that Cocoon provides us by default. Then we just point our browser to http://localhost:8080/cocoon/HelloWorld.html and - voilà Cocoon serves up our dynamically generated "Hello World" page.
The Power of Cocoon
Since Cocoon provides complete separation of content from style, you can take the same content and format it in many different ways. There's no need to create new logic or content in order to create different looks for your site. Just create a new stylesheet for each output format, and you can serve up completely different-looking sites from the same content. How could this be useful in practice? Imagine the following possibilities, all of which can be accomplished with ease using Cocoon. You could create sites that serve out the same content formatted completely differently based on:
Clearly, Cocoon's ability to do dynamically styled page generation is a powerful tool for site designers!
XSP
As discussed earlier, Cocoon heavily stresses the separation of content, logic, and presentation. If there's one place that logic and presentation are often intermingled it's in server pages. By definition, both ASP and JSP freely intermix logic and presentation, i.e., source code and HTML. Although the use of beans and taglibs in JSP can minimize this to some extent, there's still inherently some intermingling of logic and presentation, due to the use of HTML. Cocoon's solution is, once again, elegant: use XML instead of HTML in your server pages. Unlike HTML, XML is presentation-free; it's just data. So writing a server page using XML makes a lot more sense. An XSP page therefore consists of XML data tags, along with intermingled logic (Java code). As with JSP, the Java logic (through the use of either embedded code or calls to external modules) dynamically creates the page to be output. The difference here is that, once again, presentation-free XML is what the logic will generate, not HTML. All XSP pages are Cocoon generators - the source of XML events in a pipeline. Once the XSP page has executed and generated the appropriate XML stream, the stream is then typically styled and formatted using a Cocoon transformer (e.g., the XSLT transformer using an XSL stylesheet) into the appropriate output format (such as HTML). Like JSP, XSP pages are compiled into Java code (and then eventually class files), and like JSP, XSP also provides support for tag libraries (often referred to as "logicsheets" in Cocoon). As JSP developers know, calling reusable tag libraries in your pages helps to keep them from becoming too filled up with Java code. Using tag libraries with XSP provides the same benefits. XSP is too big a topic to discuss in more detail here. (It could easily fill up an entire article on its own.) This should be a good overview though, and you can refer to the Resources section if you'd like to read more about XSP.
A Cocoon Case Study
The law firm was looking for a new piece of software to replace the ancient and inflexible software they were currently using and getting increasingly locked into (Microsoft Works...for DOS!). The system functionality was not terribly complex - a basic CRUD system (functionality for create, read, update, and delete) that would provide a user-friendly front end for the legal case files in their database. The entire application would consist of less than a dozen screens. Although the head attorney was fairly computer-savvy (he had recently mocked up a prototype for the new system in MS Access), he was looking to me for technology recommendations and was happy to defer to my knowledge and experience. My first recommendation to him was to choose a Web-based system over an application in MS Access. This was an easy decision for me, as there would be several benefits to be gained from a Web-based system, including ease of development, minimal training required for the rest of the staff, and no software installation or upgrade procedures needed. But which Web development platform to choose was a bigger question. Off the top of my head JSP and Struts were the leading candidates, but I also wanted to consider some newer, more cutting-edge technologies as well. As I had heard of Cocoon before, and had worked heavily with XML on a recent project, I started reading up on Cocoon to see if it would be a good fit. Cocoon's technology was intriguing, and my experience with XML enabled me to come up with an idea that I realized could save a good bit of development time. Since the screens were fairly simple and similar (just rows of fields from the database) I realized that I could design the GUI extremely quickly and easily by just mocking up the screens in XML (see Listing 4, a mock-up of a Web page in XML that will later be rendered using HTML tables). Then switching hats and putting myself in "style" mode, I could turn all the screen mock-ups into Web pages just by writing a single stylesheet that would transform the XML into HTML. Each <page> tag could be transformed into a skeleton for an HTML page, each <section> tag could be transformed into an HTML table, and each <row> tag could become a row in the table. The idea appealed to me. What finally clinched the decision to use Cocoon, however, was an additional requirement, one that I initially wasn't sure how to accomplish. Law firms, as we all know, generate reams of documents, and one of the reasons this firm had stuck with MS Works all these years was its ability to mail merge the information from the database into a document. If they were going to abandon Works, the new system would need to provide mail-merge functionality as well. At first, I had no idea how I would provide a mail merge in a Web-based system. But as I thought it through I began to formulate a plan. First of all, I realized the best approach would be for the site to serve up the merged documents as a download from a Web page. This would be simple for the users. Most browsers' "open-attachment" functionality is automatically configured to launch the appropriate application when an attachment is opened, so each time the user generated a mail-merge document, the word processor (MS Word) would automatically launch and open to that document. This would work out quite nicely. But how to do the mail merge itself? Although MS Word has a mail-merge utility, I felt it would be clumsy for the users. It would be much simpler for them if they could just click on a button labeled "create merged document" and have the document arrive with all the merge substitutions already done behind the scenes. What would be required to make this happen? Having the site retrieve the appropriate data from the database was certainly easy enough. After that, I reasoned, the template document would need to be read in, the merge fields identified, and the actual data substituted in. Reading an MS Word document was a tall order - I wasn't aware of any Java libraries that could do that. But what about an RTF (Rich Text Format) file? RTF, unlike the Word .doc format, was text-based and would be much easier to read. In fact, I could probably write a parser to do it. I read though the RTF spec and after spending a couple of hours with the JavaCC parser generator, I was able to successfully read RTF documents and find the mail-merge fields in them. I checked with the head attorney to make sure he didn't mind using RTF format instead of Word, and he didn't. As long as they could still use the MS Word application to edit the documents (which they could), he was fine with it. That left me with the last bit: How to substitute the data retrieved from the database for the merge fields? Boy, I thought, it would be nice if there was some existing code that could already do this so I wouldn't have to write it from scratch. What type of software could I use to scan through a document for a particular piece of content and change the value of that content before outputting it again? Then it hit me: I could use an XSL stylesheet! XSL was built to easily handle tasks like this. Suddenly the whole idea began to come together, and my decision to use Cocoon was clinched. I would write a new generator that would parse an RTF file and turn it into a stream of XML events. I would generate a stylesheet in response to each mail-merge request that substituted database values for the merge fields and I'd use the XSL transformer to apply this stylesheet. Then I would use the text serializer to write out the new mail-merged RTF file data, I'd set the appropriate MIME type for RTF documents ("application/rtf"), and the user would get served up a mail-merged RTF document. I put together another proof-of-concept and, sure enough, it worked. Writing the application using Cocoon worked out well, though it did have its share of challenges. As the application was not particularly complex, the code was not particularly difficult. The biggest challenge, however, was getting up to speed in some of the new technologies I was using, primarily Cocoon and XSL. I wound up learning them incrementally, as needed, when I hit roadblocks in various pieces of the development. ("Hmmm. How do I do this in XSL?") The online documentation I found for XSL and Cocoon was helpful, as was Michael Kay's book, XSLT Programmer's Reference 2nd Edition. (And, of course, I also posted my share of "Help me!" messages to the Cocoon Users mailing list.) The system was finally completed and installed in December 2001, and got a big thumbs-up from both the users and the head attorney. It is now used daily by the entire staff. From my perspective, I give Cocoon a big thumbs-up. I chose it as the core technology for this project, and it accomplished everything I needed. I found that using XML and Cocoon on the project allowed me to deliver it faster, as well as helping out tremendously in the conception and design phase. While designing I was able to focus completely on what type of content I was going to display on each page and how it would be generated, and completely ignore all presentation and style concerns until a later time. I found this separation of concerns during design to be quite a refreshing change!
Cocoon Concerns
- It's still more difficult to find developers experienced in Cocoon and its related technologies than in more mainstream technologies like JSP or Struts. - If it doesn't catch on, it may be difficult to continue enhancing and supporting applications built with it.
For More Info
If you're intrigued by what you've read here, I'd encourage you to start using Cocoon. There's no better way to learn than by hands-on development. The best way to approach Cocoon is to start with a small, simple site and build it up incrementally from there, learning as you go. There's also loads of additional documentation available about Cocoon online, in fact, probably too much for a novice user. Again, I'd encourage you to approach it incrementally. Read a little at a time, learning more about each of the various components and techniques as you need them.
Resources
XSP Tutorials: Reader Feedback: Page 1 of 1
Enterprise Open Source Magazine Latest Stories . . .
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||