| Close Window |
The adoption of Service Oriented Architecture (SOA) promises to further decouple monolithic applications by decomposing business functions and processes into discrete services. While this makes enterprise computing assets more accessible and reusable, SOA implementation patterns are primarily an iteration over previous application development models. Like most application development evolutions, SOA approaches inject more layers and flexibility into the application tier, but have often neglected the most fundamental building block of all applications: the underlying data.
Current Data Environment of Most IT Organizations
The condition of a typical organization's data environment is usually not where it needs to be before the organization can begin a SOA transformation - from an enterprise perspective, there's often a lack of authoritative sources and a wide array of technologies used for storing and processing data. Generally, there's no single system that offers a complete view of the organization's core business objects, since most large IT organizations have their core enterprise data spread out and replicated across multiple stove-piped systems. Each system in an enterprise often maintains data within its specific context rather than the context of the enterprise. Data quality and interoperability issues abound, especially when data-consuming systems access a variety of data-producing systems, each of which maintains an isolated view of enterprise data. These differences lead to inconsistencies and inaccurate views of the business processes. Figure 1 illustrates these data access and management challenges impacting SOA transition initiatives.
An SOA transformation amplifies and exacerbates an organization's existing data problems. Because of the integrated nature of SOA-based applications, an organization will be building on top of a very weak foundation unless it first addresses the issues with its current data environment. This is, in many ways, analogous to constructing a high-rise building on top of a landfill.
Consider the lack of authoritative enterprise sources as an illustrative example. Suppose that in an organization's supply chain systems' portfolio there are five systems that hold supplier information internally. Each of these can be considered a legitimate source of supplier data within the owning department. When building a service to share supplier data, where should the source of supplier data be?
Target Vision of Data Environment in a SOA
The way an organization thinks about applications and data must evolve - it must stop thinking about data as a second-class citizen that only supports specific applications and begin to recognize data as a standalone asset that has both value and utility. Organizations should establish their data environments with "hubs of specific data families" that expose data services that comply with industry standards and service contracts. The goal is to create a set of services that becomes the authoritative way to access enterprise data. In this target service-oriented environment, applications and data work together as peers. Thus, both an organization's business functionality and data can be leveraged as enterprise assets that are reusable across multiple departments and lines of business. This target vision, illustrated in Figure 2, enables the following desired characteristics of the enterprise's data environment:
Data Governance
Governance is often cited as an important part of SOA. However, this generally refers to the governance of services and not the data shared through services. Just as proper governance of services is critical to an SOA, proper governance of the data is equally, if not more important. Many of the problems associated with an organization's data environment can't be solved through technology solutions alone. Decisions and policies must be issued at the organizational level that can then be implemented through the technology. For example, the absence of an enterprise data ownership concept is a classic data governance issue. Different divisions in an organization control the data within their own system boundaries. They can make changes to that data as they see fit and these changes can ripple across other divisions and ultimately impact the interoperability of the enterprise as a whole.
Without a definition of enterprise ownership and stewardship of the data controlling such changes are difficult. So an SOA data strategy should include establishing an enterprise data management function as the data governance mechanism. A centralized management function is needed to treat data as an enterprise asset instead of as the assets of individual departments. The group responsible for this function addresses data issues and establishes policies and processes that cut across multiple departments. The responsibilities of such a group should include:
How does an organization make this shift? First, it must decide on its "core" data families, which are sometimes also referred to as "master" data. Core data is relatively easy to deduce, given a general understanding of the key business processes. For example, the "supplier" data family in a supply chain business could be considered core data. While it may be tempting to model every core data family in full detail, it may be wiser to identify "a good first set" and begin with that. A good approach is to simply tackle the obvious core data families first, learn from the experience, and then apply those lessons learned to model the rest of the data.
Next, the enterprise model must decide which data elements are strategic for operational, reporting, and accountability purposes, and which are relevant only to one or several subsets of the business. Common strategic data should be thought of as the subset of fields that any application that uses this data family can fully support. All the other data attributes should be considered "optional," even if they're critical to certain applications. In the supply chain example, "supplier performance" isn't part of the core data but it may be critical to one or two systems in the organization. Since the enterprise data model is applicable to the entire organization, the standard would always expect the core data to be provided. It should also give each system the flexibility to be extended with additional data that's relevant to its own purpose.
Technical Considerations
SOA implementations usually exhibit decentralized federated topologies. So the ability to merge data properly and enable authorized enterprise-wide access are necessary to ensure that information can be leveraged to support enterprise objectives. Enabling these capabilities presents numerous challenges in the areas of data quality, security, and the data services architecture. The next sections describe some of these challenges and provide recommendations for addressing them.
Data Quality
SOA initiatives often focus on the implications of connecting disparate systems. A fundamental concern of implementing such connections is how to ensure that the data exchanged is accurate, meaningful, and understandable by all participating parties. Users, consuming services, and data sources all operate as peers in an SOA. These peers will often use data in new and unanticipated ways. So it becomes increasingly difficult to serve meaningful information without normalizing existing data assets. This includes not just schematic normalization, but instance-level de-confliction as well.
To this end, data quality studies are paramount in ensuring the success of a SOA implementation. Typically this includes understanding what data is available, where this data is located, and what state it's in:
Resolving Data Conflicts
Besides these scoping and
profiling exercises to manage data quality, it's also imperative to
resolve value-level conflicts that exist in the data. These conflicts
can be categorized into three major types (C.H. Goh, "Representing and
Reasoning about Semantic Conflicts in Heterogeneous Information
Systems," Sloan School of Management, Massachusetts Institute of
Technology, 16-22, January 1997.):
These data conflicts can often be addressed by using commercial data management tools and methodologies, as well as enterprise data modeling software. Another emerging possibility is semantics-centric modeling environments. Instead of hard-coding data cleansing routines, these tools use a semantic description of the enterprise - the business concepts and relationships between those concepts, as well as any business rules governing the relationships - and provide a mechanism to describe how legacy systems support the semantics of the enterprise. This useful abstraction lets the enterprise deterministically identify how each enterprise data asset supports the enterprise business functions, as well as any gaps between the enterprise semantic model and the underlying data representation schemes. This modeling approach can then be used to determine where physical data conflicts or duplications may exist, as well as forward engineer data consolidation and cleansing scripts.
Data Access Controls
In traditional application
architectures, data access security is typically governed by
application-specific mechanisms. In this environment, each source has
its own set of users, roles, and access control policies. Which means
that user profiles, roles, and access control policies lack consistency
across the enterprise. An SOA environment magnifies this problem by
making data sources visible across the organization. So it becomes
increasingly important to move away from individual
application-specific and data source-specific mechanisms in favor of
enterprise-level SOA identity management and access control mechanisms.
This means that when creating the central data services layer, the data sources must rely on central provisioning of some security functions so they can be managed centrally. The challenge is in finding the right balance between the security functions that should be managed centrally and what should be managed as part of the data sources. There are several options in implementing such a scheme, including a centrally managed data security layer, or using layered authorization through multiple policy decision points (PDP).
With the central management option, the data sources relinquish security and rely solely on the data services to protect the access to their data. Within each data source, a single user profile is created for the data service that has full access to the data. Any request to the data through this service is authorized through this user profile. So there's no longer a concern about whether the principal's identity from the overarching security domain exists or means anything in the data source. However, this option pushes security checks into the data service layer and reduces the granularity of accountability. As a consequence, any access control policies from the data source along with the associated roles and privileges should now be re-created and maintained at the central enterprise points.
In contrast, layering the use of multiple policy decision points encourages the reuse of existing authorization capabilities, user profiles, and access control policies of the underlying data sources. This approach allows some of the more fine-grained access control decisions to be made at the data sources rather than elevating them into the enterprise layer. Although many variations exist for this design, the premise is that different layers of authorization with multiple PDPs are making the decisions. The basic flow of this approach is as follows: Authentication still occurs at the edge using enterprise authentication services. Requests for data originate at different security domains in the enterprise. A PDP in each of these domains evaluates requests for resources in that domain. When a data service is invoked it calls the enterprise policy decision point to authorize access to the data service as well as the specific operation requested. The data service then delegates the decision to each data source so they can authorize access to their specific data object(s). Thus, coarse-grained decisions are made at the enterprise level while finer-grained decisions use data source-specific profiles and policies that aren't exposed to the enterprise.
Data Services Architecture
From an architectural
perspective, the heart of this solution is an enterprise layer that
logically centralizes access to the data spread across the enterprise.
This set of logically centralized data services provides several
architectural advantages. First, the enterprise can assert greater
control over the governance and implementation of data access
mechanisms. Second, clients use a consistent mechanism to access data.
Third, the enterprise can design and implement a solution in a holistic
fashion instead of the typical one-off models that are the norm in data
integration. Finally, besides the basic Create, Read, Update, and
Delete (CRUD) operations, the underlying architecture must also support
data aggregation, inter-service transactions, and multiple access and
usage patterns, all while ensuring acceptable levels of quality of
service.
Data Aggregation Scenarios
This data services
layer acts as a façade over the enterprise assets - it logically
provides access to enterprise data assets in a singular manner, while
physically dispatching requests and aggregations across relevant
co-located assets. Three main scenarios should be considered for data
aggregation:
Some of these aggregation capabilities can be supported through Enterprise Information Integration (EII) technology, which provides SOA-centric capabilities for accessing and querying co-located data in real-time. EII products provide adapters to legacy data sources and expose their underlying data in a service-oriented fashion. EII is best used in discrete query-based mechanisms where data volumes are moderate. EII isn't meant to be a replacement for traditional ETL (extract, transform, load), EAI (enterprise application integration), or MDM (master data management) technologies. For example, some of the aggregation scenarios requiring de-duplication capabilities can require the use of MDM technologies.
The data services layer allows creates and updates to be requested once by a client and then decomposed by the supporting architecture into individual write commands to targeted data sources. Therefore, the architecture must support transactionality - ensuring that writes are consistent so that underlying data across all affected data sources are left in a consistent state. This isn't significantly different from current data integration pains. However, most systems today requiring multi-write transaction capabilities leverage the XA standards. Similar standards for the Web Services environment are only starting to emerge. OASIS has recently formed a Web Services Transaction Technical Committee (WS-TX TC) responsible for stewarding WS-AtomicTransaction, WS-Coordination, and WS-BusinessActivity specifications through the standardization process. None of these standards have been ratified yet. Because these specifications are still being developed, most SOA-related transaction support is being custom-developed, typically through the use of homegrown compensation mechanisms - effectively an "undoing" of a previously executed service invocation. Instead of providing true rollback semantics, compensation is an additional service invocation that rewrites data to its original state. While it may be beneficial to take a wait-and-see approach to building transactionality, solutions aligned with the three specifications seeding WS-TX deliberations will likely provide the path of least resistance to standards compliance.
Quality of Service
With all the data access
operations going through this data services layer, a major concern is
the potential bottleneck at this layer that may limit scalability. The
obvious way to resolve this problem is to create a clustered
environment with multiple instances of this data services layer.
There are complexities with clustering dependant on whether the enterprise is using a purely federated approach or has some level of data replication. If using a purely federated approach, then it can be simple to have a cluster with multiple instances. However, the architecture must still address the issue of affinity for a particular instance - especially in the case of inter-service transactions. The architecture must address questions such as: Are all operations that are part of a transaction forced to go to the same data service instance? Can different operations that use different data service instances still be part of the transaction?
A simple solution is to require all operations in a single transaction to interact with a single service instance. However, this solution isn't without its disadvantages since it can affect how well the load is distributed across the cluster. With some replication, clustering becomes more difficult. In addition to the server affinity issue, the architecture must include a partitioning strategy. This strategy answers questions such as: Do all instances of the data services allow access to all the data? Or are data services partitioned so that only certain instances allow access to certain data?
Data Access and Usage Patterns
It's important to
note that different applications have different data access and usage
patterns. Some applications can produce many transactions but access
only a small amount of data in each transaction. For other
applications, the transaction throughput can be small but the volume of
data that's accessed very large. The way to tune data source
performance for these patterns is very different. When using a data
services solution to provide centralized access to enterprise data
sources, the enterprise must accommodate all the various access and
usage patterns of the applications that will be integrated with this
solution. Tuning the infrastructure to support a single application's
performance requirements is complicated, trying to tune it to
adequately support multiple patterns of use and access will be even
more difficult. Often, there will be conflicting configurations -
something that optimizes the performance of one application will
degrade the performance of another. The enterprise should analyze and
model the access and use patterns of the applications that will be
using the data services and ensure that well-defined performance
criteria for each scenario have been developed. Additionally, enough
time should be planned for testing the performance of a particular
solution with simulations that reflect the access and usage patterns
that are common to the enterprise environment.
Summary
Harmonizing data assets has always been a
challenging problem; the problems and urgency are further exacerbated
when migrating to an SOA. Developing a strategy for handling this kind
of transition is essential to properly enabling data access in an
enterprise SOA environment. By developing appropriate requirements and
use cases and by analyzing data assets and data usage, organizations
can better understand the breadth and depth of their data integration
issues and begin to take steps to address them. Ultimately, every
organization must develop a strategy tailored to its specific needs,
but the overall approach described in this article provides guidance in
understanding what types of questions should be asked and how to
leverage possible technology solutions to address the resulting issues
that are identified. This guidance will enable organizations to fully
leverage and exploit their most important strategic asset: their data.
© 2008 SYS-CON Media Inc.