Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Using Data Services to Build Functional Services
Data management infrastructure cuts costs and accelerates development

In almost every significant SOA deployment, a few services have advanced requirements that force those services to intelligently manage the data that they use. A fault-tolerant service might be deployed on a cluster of machines, which means that the instances of the service must share data across several machines.

A high-performance service might use in-memory caching to achieve fast response time. An aggregation service might retrieve or update data from multiple sources, transform that data into a standard representation, and integrate the resulting data in real-time. For these kinds of services, the data management infrastructure can significantly reduce development effort and deployment risks.

In a SOA, services act as the building blocks for implementing business processes. Each service, hereafter called a functional service, offers a set of operations. The implementations of these operations usually involve querying and updating data from one or more data sources. Data services are the next layer in the SOA architecture. Data services support functional services by acting as a high-level abstraction for data: rather than directly exposing functional services to the complexity of data replication, data transformation, and data federation, data services hide those details and present a simple view of enterprise data. Data services expose high-level data manipulation operations, whereas functional services expose business domain-oriented operations. Numerous commercial data management infrastructure products are valuable for writing data services. These infrastructure products reduce the cost, improve the reliability, and accelerate the development of data services.

This article describes several categories of data management infrastructure that are available today that can be used to build data services. A case study from a major hotel chain illustrates how data services were used in a real-world SOA. We'll discuss the next generation of data management infrastructure, which blends independent data management capabilities to provide a cohesive platform to deliver consistent, reliable, and pervasive access to data.

Data Access Infrastructure
Enterprise data can originate from a variety of data sources. Relational databases are a common source. But data might also originate from non-relational databases (such as IMS or object-oriented databases), files (such as XML files), applications (such as other functional services), or mainframe resources. The challenge with heterogeneous data sources is that each type of data source offers a distinct interface for retrieving and updating data.

A data access infrastructure can simplify the use of heterogeneous data sources by providing a view of the data that's independent of the underlying data source type. For example, data might originate from relational databases or mainframe resources, but data services could present that data to functional services as Java objects. As data services use new types of data sources, there's little or no impact on the functional services. The functional service simply gets richer data from the data services.

Several vendors offer infrastructure for accessing data sources. For example, there are products that provide a JDBC interface to mainframe applications, so that data services can use a familiar interface to query or update the mainframe. Similarly, XQuery products make it easier to manipulate data stored in XML files. Data services can exploit these data access products to connect to a broad range of data source types.

Data Replication Infrastructure
While data sources serve as the definitive origin and repository for enterprise data, often a data service needs a local copy of the data. For example, if a data service needs to rapidly query data, it might use an in-memory cache to store a local copy of frequently used data. To recovery quickly from process failures, the data service might write a copy of the cache to disk, so that it can quickly reconstruct the cache without having to query the original data sources. If a data service needs to operate reliably even when a data source is unavailable, the data service could maintain a local copy of the data so that processing can continue even without the primary data source.

Making a local copy of data is easy. Ensuring that the copy stays up-to-date is more challenging. A data replication infrastructure can automate the replication of data so that both the initialization and subsequent updating of the local copy occurs automatically. A data replication infrastructure can offer a spectrum of "qualities of service" (QoS) that meets the varied requirements of data services. For example, a local copy might simply correspond to a snapshot of a data source that's periodically refreshed to reflect recent changes. Or the local copy might be continually updated via distributed transactions so that the copy is guaranteed to be identical to the original data source. Another QoS is whether the local copy is writeable or read only. Yet another QoS is whether the local copy is recoverable after process failure (because the local copy is backed up to disk) or not (because the local copy is stored in volatile memory).

Not only does replication infrastructure unburden a data service from the drudgery of synchronizing data, it also provides the data service with a high-level abstraction for managing data. The data service merely declares its QoS requirements and the replication infrastructure hides the complexity of synchronization in accordance with those requirements. These powerful abstractions accelerate the development of data services and they ensure that data services have reliable access to replicated data.

Data warehouse technology is commonly used for disk-based data replication. Traditional data warehouses use a batch-oriented approach to initialize and update the local copy. Real-time data warehouses support continuous incremental updates to the local copy, which means that the local copy is nearly synchronized with the original data source. Both traditional and real-time data warehouses produce a read-only local copy of data, which means that updates are disallowed because they aren't propagated back to the original data sources.

Distributed in-memory caching, which automatically synchronizes data across a group of high-speed caches, is another example of replication infrastructure. Each cache is typically deployed directly within an application process, which provides the application instant access to the cache's data. This infrastructure accelerates the performance of data services by provisioning data directly into the application address space, but with the limitation that the local copy isn't fault-tolerant.

Another data replication technology can provision data for disconnected mobile applications. This represents a powerful abstraction because a data service simply relies on the replication technology to manage the complexity of synchronizing data whenever a mobile computer is connected to the network. The data service is mostly unaware of whether the machine on which it's deployed is connected to the network, shifting the burden of data management to the replication infrastructure.

Data Integration Infrastructure
Occasionally, a data source stores data in exactly the format required by the data service that uses it. More often, the storage format doesn't match the requirements of the data service, so some data transformation is required. For example, suppose a data service that's implemented in an object-oriented programming language needs to use data from a relational database. Object-relational mapping is a standard infrastructure for performing this transformation. Object-relational mapping automates the translation of data from the rows and columns representation of a relational database to the instances and attributes representation of an object-oriented programming language. It also eliminates hand-coded database queries, optimizes the use of database connections, and offers a high-level abstraction for data manipulation.

But consider a more complex example. Suppose a data service needs to aggregate data from multiple relational databases. And suppose that the relational databases use different schema. In this example, not only must relational data be converted to an object representation, but the schema differences have to be reconciled too. Data originating from the different data sources has to be integrated into a common representation that's easy for the data service to use.

Infrastructure for data integration and data federation can address these challenges. For example, enterprise information integration products provide a data service with a tailored view of data, where the data originates from distributed data sources with different schema and data types. This infrastructure is valuable for building data services because it abstracts the underlying data sources and presents a unified data representation that's independent of the format and location of data.

Case Study
A major hotel chain embarked on a large-scale effort to rebuild their reservations systems as an SOA. As part of that effort, they identified two critical functional services that required data services: Availability and Permissibility. The Availability service must rapidly respond to inquiries about room vacancies on particular dates at specific hotel properties. The Permissibility service evaluates whether reservations conform to rate plan rules. Both services are written in an object-oriented programming language, retrieve data from a relational database, support very high transaction volumes, and promise highly available operations.

To build the data services that supported the Availability and Permissibility functional services, they needed data infrastructure that solved several problems. There was a mismatch between the relational representation used to store data and the object-oriented index data structures used for computation. Data had to be synchronized across a cluster of processes that implemented each functional service. And the functional services required instant access to data to deliver fast responses to requests.

To build its data services, the hotel chain selected Progress ObjectStore Enterprise from Progress Software. ObjectStore acts as a high-performance distributed durable cache for object-oriented data. It offers transparent storage of object-oriented data structures; it delivers data automatically to distributed in-memory caches; it guarantees strong transactional consistency semantics; and it provides instant access to data.

One key challenge overcome by the data services was to transform data from a representation optimized for storage into a different representation optimized for computation. The relational representation of data was normalized and semantically complete, but very inconvenient to manipulate. To answer a typical query about room availability, an expensive multi-table join operation was required. In response, they developed an optimized index structure that contained the same semantic information as the relational database, but in a different format that could efficiently answer queries.

The initial construction of the index structure required a full scan of the relational database, a process that took several hours. Once the index structure was built, updates to the relational database required corresponding incremental updates to the index structure. It was mandatory to ensure that the index structure was backed up to disk so that a temporary failure of the data service wouldn't result in a lengthy outage to reconstruct the index.

The use of a data management infrastructure enabled the hotel chain to build data services that supported the operational requirements of the Availability and Permissibility functional services. While they contemplated building an in-house implementation of data management infrastructure, they quickly determined that it was quicker, more cost-effective, and less risky to buy a commercial infrastructure product instead.


About Vivek Singhal
Vivek Singhal is Vice President of Technology for the Progress Real Time Division. He leads a team responsible for defining the architecture for the Real Time Data Services products. Previously, Vivek was Vice President of Engineering at Persistence Software, responsible for product development and product management. During his 10 years at Persistence (which was acquired by Progress in November 2004), he designed and patented several of the core technologies of Persistence's enterprise data management products. He defined and evangelized the product strategy, orchestrated the activities of the engineering team, and led the delivery of numerous product releases.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

In almost every significant SOA deployment, a few services have advanced requirements that force those services to intelligently manage the data that they use. A fault-tolerant service might be deployed on a cluster of machines, which means that the instances of the service must share data across several machines.


Your Feedback
SYS-CON Brasil News Desk wrote: In almost every significant SOA deployment, a few services have advanced requirements that force those services to intelligently manage the data that they use. A fault-tolerant service might be deployed on a cluster of machines, which means that the instances of the service must share data across several machines.
Enterprise Open Source Magazine Latest Stories . . .
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP). The...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


SYS-CON Featured Whitepapers
ADS BY GOOGLE