|
SYS-CON.TV Webcasts
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Top Links You Must Click On
Feature XML and Distributed Computing
XML and Distributed Computing
By: Boris Lublinsky
Apr. 10, 2001 12:00 AM
There are three big challenges when implementing distributed computing systems: data transfer, interface management, and remote invocation. This article examines how XML can help with each of these, and how XML-based semantic messaging can unify disparate distributed architectures. Most popular distributed computing models, such as DCE, DCOM, RMI, and CORBA, attempt to present the developer with the standard function/method invocation paradigm, which is exactly the same as a local invocation. Foo (int a, char b, double c)This is usually done by describing the interface using some type of interface definition language (IDL). IDL is operating system, hardware, and programming language independent, and provides a standard approach for the definition of the interface. IDL compilers exist for different operating system/programming language combinations, thus allowing for remote invocation across operating systems and programming languages. The code that's generated by the IDL compiler (and underlying middleware services) is responsible for accepting data from the sender in the format the sender uses, and for presenting it to the receiver in a format the receiver understands. This paradigm is convenient for the developer, because it hides the distribution aspect from him or her, making it nearly transparent. Unfortunately, the penalty for this convenience is tight coupling between communicating applications. This usually doesn't pose a problem for local communications (within the application itself), but can be a real challenge for multiple applications within the enterprise or, even worse, in interenterprise communications. The most typical problems when using this approach are:
For several years semantic messaging has been suggested as a way to solve these problems. Semantic messaging is defined many different ways; even execution and transactional semantics have been considered part of the definition. Throughout this article we define a semantic message as purely data semantics - a message should contain data and a definition of what this data element represents. Thus, applications that deal with semantic messages are not driven by the sequence of data or its type, but rather by naming conventions (data semantics). This paradigm is significantly better suited for implementing a data transfer. The simplest case of semantic messages is name/value pairs, which are used internally by distributed computing models. These types of messages are self-describing in the sense that transferred data is defined not by its position in the data stream, but rather by the name of this particular piece of data. Self-describing data allows two applications to share data semantics (names), instead of agreeing on an internal data representation and a data sequence within the message. Instead of parsing incoming information and providing access to every piece (as in the IDL-based approach), a self-describing data approach presents all the input data to the application as a single message. This approach forces an application to extract the required information by parsing the incoming message. XML enables self-describing structured data of any complexity to be implemented in a uniform fashion using XML documents. The availability of standardized XML parsers and the standard representation of XML documents (DOM) simplifies the parsing and extracting of XML data on the fly. Although this approach requires more effort than standard distributed computing models, it allows for significantly fewer coupled applications. A typical example of an XML-based semantic message is presented below: <PurchaseOrder> By allowing arbitrary structure usage inside the documents, XML enables the introduction of arbitrary, complex-structured data semantics. An additional advantage of XML is that it allows for typeless data transmission, taking variable types out of the equation, so every application converts data to the type it wants to handle. An additional problem of distributed systems is that two applications can share the same data, but use different semantics for it (for example, POID versus PurchaseOrderNumber). Standard XML tools such as XSL processors automate the transformation of the data semantics, simplifying the integration of the systems with different vocabularies. Semantic messaging builds significantly less coupled systems for the following reasons:
In our experience, using semantic messages leads to the creation of flexible, loosely coupled systems.
Interface Management
Foo1 (double x, int a, char b)Because these two interfaces essentially support the same functionality, this leads to the creation of very "wide" interfaces that require all feasible parameters to be present. For the example above, this is: Foo (double x, int a, char b, double c)This approach reduces the number of interfaces required, but forces all the applications invoking it to submit all the parameters they might not have. To solve this problem a system of flags, or something similar, is introduced to manifest the absence of certain parameters. Continuing with the above, we'll get something similar to the following: Foo (double x, int a, char b, double c, boolean type)where the value of "type" defines the actual type of invocation. The overall approach is equivalent to introducing a proprietary implementation of data semantics. Due to the proprietary nature of this solution, it can't be generalized, thus requiring a new implementation for every interface. XML-based semantic messages allow a single interface to be created for every required function with a well-defined signature using input and output parameters in the form of XML documents. For example: Foo (xmldoc input, xmldoc output)The input document contains any allowable combination of the input parameters, and the output document contains a combination of the output corresponding to the input parameters received. For the previous example, the input document can be either: <input>or: <input>or even: <input>if the proper set of defaults for b or c exists. In the examples above, the same input document contains the different types of data the sender wants to submit for the actual invocation. Because the input document contains the information semantics (names of the data elements), there's no need for any additional "flags" (see example above) defining which information has actually been submitted. The receiving application can obtain this information by traversing the input document. This approach also simplifies the interface maintenance. In the standard distributed environment, every time an interface changes on one of the systems, a corresponding change has to be applied to all the users of this interface. Even more, all these changes have to be implemented simultaneously, otherwise the overall system will stop functioning. Semantic messaging, which provides a more loosely coupled system, usually simplifies this problem by processing the parameters programmatically within the application itself. As we've discussed above, a well-defined system of defaults can allow for incremental changes in both sending and receiving applications.
Remote Invocation
The introduction of the proxy and stub, though simplifying the programming model, complicates the overall system and makes it more expensive computationally and memory-wise. In this architecture, remote communications between components are encapsulated in the proxy/stub communications, which are generated by an IDL compiler that's based on the middleware APIs. The sender only communicates with the local proxy, and the receiver gets all the requests from the local stub. Because proxies and stubs are created in the memory space of the sender and receiver, every connection requires additional memory. In the simple example in Figure 2, a component from the client process connects to four different components in the server process. The client process contains four proxies and four server stubs. The server supports 100 clients and there are 400 stubs, which adds up to a significant amount of memory. These stubs also have to be created and destroyed at some point, increasing the overall computational expenses. A separate logical connection is required between proxy/stub pairs. Although most of the distributed model implementation can share the underlying physical connection (usually TCP/IP), establishing a proxy/stub connection is an expensive operation. As a result, all distributed computing models recommend establishing a connection once and keeping it open for the life of the component on the sender. This leads to the synchronization of the life cycles of the components on the receiver and the sender. When the receiver is a server, this leads to reduced scalability (due to the memory usage), which is usually not acceptable. Both DCOM and EJB specifications are introducing intermediate context objects to solve this problem. XML-based semantic messaging for stateless components provides an elegant solution (see Figure 3). Stateless here means having no conversational state with the receiver. Stateless components can still have persistent state, which is kept in the database. A "gate" object is introduced that represents all the components in the process. It can send messages to the process rather than to any particular component of the process. The number of proxy/stub pairs, in this case, is one (for the gate object), unlike four in Figure 2. A single gate accepts all the messages destined for any component within the process and orchestrates the actual execution. Since XML doesn't impose any limitations on the message semantics, it's possible to place every message in the standard envelope. This envelope includes the component name and the method name of the component for which the message is sent. The gate object above consists of two major parts (see Figure 4). The message receiver is a listener on a particular protocol (DCOM, CORBA, RMI, etc.). This object is responsible for accepting messages using a predefined communications protocol. After the message is received, the rest of the communications are internal for the receiving process. The message router is responsible for parsing the envelope and extracting the name of the component, the method for which the message is intended, and the message itself. The router then instantiates the appropriate component, passes control to it for execution, and deletes it when processing is complete. The idea of the gate object is not a new one. It's somewhat similar to the "façade" pattern or stateless session beans introduced by J2EE. The biggest difference here is using the power and flexibility of XML-based semantic messaging, which standardizes the gate interface and makes the relationship between the gates and supported components dynamic. This is the only architecture that doesn't require rewriting gate to support additional components or additional methods on the components. Support for introspection, introduced in most of the modern middleware environments, makes implementation of routers even simpler. So far we've discussed using the gate object to "multiplex" access to components within the process. The same architecture can also be used to multiplex access to the distributed system based on certain middleware products. In this case, gate can serve as a "gate" and a "bridge," meaning that in addition to multiplexing, it converts messages between multiple middleware types. For example, gate can accept RMI messages while internal communications can be DCOM. Let's analyze the complexities of "bridging" components running on top of different middleware products and see how XML-based semantic messages can help solve them.
Name Resolution
Components Invocation
Data Transformations
Gate-based architecture solves the following problems:
This type of architecture is the foundation of today's Web technology. The Web server can be viewed as a complex gate bridging HTTP communications with the system's internal middleware, whether it's CGI, RMI, CORBA, or DCOM (see Figure 6). Traditionally routing was based on URL and data transfers on semantic name/value pairs submitted to the Web server. Advances in B2Bi and XML are changing this approach by introducing XML-based semantic messaging, and incorporating routing information and data (very similar to the solution we discussed above). Examples of implementations of XML semantic messages over the Web are XML and B2B servers and Web services.
Unified Distributed Systems Architecture
Presented architecture lets you bridge multiple systems running on different middleware platforms in a standardized fashion. The advantages of this generic architecture are:
The extension of described architecture is conversational state support, which can be easily introduced in a fashion similar to session support in the Web servers. The conversational state of the components can be externalized in the form of XML documents and stored in the XML router. When a new request arrives, the router can append state information to the request data. When implementing this architecture, it's important to use XML-based semantic messaging correctly. Although using semantic messaging adds flexibility and agility to the overall system, it also usually increases the latency of the overall execution. This is why I suggest using XML-based semantic messages as follows:
Conclusion
Reader Feedback: Page 1 of 1
Enterprise Open Source Magazine Latest Stories . . .
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||